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ETAPS Foreword 


Welcome to the proceedings of ETAPS 2018! After a somewhat coldish ETAPS 2017 
in Uppsala in the north, ETAPS this year took place in Thessaloniki, Greece. I am 
happy to announce that this is the first ETAPS with gold open access proceedings. This 
means that all papers are accessible by anyone for free. 

ETAPS 2018 was the 21st instance of the European Joint Conferences on Theory 
and Practice of Software. ETAPS is an annual federated conference established in 
1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. 
Each conference has its own Program Committee (PC) and its own Steering Com- 
mittee. The conferences cover various aspects of software systems, ranging from 
theoretical computer science to foundations to programming language developments, 
analysis tools, formal approaches to software engineering, and security. Organizing 
these conferences in a coherent, highly synchronized conference program facilitates 
participation in an exciting event, offering attendees the possibility to meet many 
researchers working in different directions in the field, and to easily attend talks of 
different conferences. Before and after the main conference, numerous satellite work- 
shops take place and attract many researchers from all over the globe. 

ETAPS 2018 received 479 submissions in total, 144 of which were accepted, 
yielding an overall acceptance rate of 30%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their peer reviewing efforts, the PC members for their 
contributions, and in particular the PC (co-)chairs for their hard work in running this 
entire intensive process. Last but not least, my congratulations to all authors of the 
accepted papers! 

ETAPS 2018 was enriched by the unifying invited speaker Martin Abadi (Google 
Brain, USA) and the conference-specific invited speakers (FASE) Pamela Zave (AT & T 
Labs, USA), (POST) Benjamin C. Pierce (University of Pennsylvania, USA), and 
(ESOP) Derek Dreyer (Max Planck Institute for Software Systems, Germany). Invited 
tutorials were provided by Armin Biere (Johannes Kepler University, Linz, Austria) on 
modern SAT solving and Fabio Somenzi (University of Colorado, Boulder, USA) on 
hardware verification. My sincere thanks to all these speakers for their inspiring and 
interesting talks! 

ETAPS 2018 took place in Thessaloniki, Greece, and was organised by the 
Department of Informatics of the Aristotle University of Thessaloniki. The university 
was founded in 1925 and currently has around 75000 students; it is the largest uni- 
versity in Greece. ETAPS 2018 was further supported by the following associations 
and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer 
Science), EAPLS (European Association for Programming Languages and Systems), 
and EASST (European Association of Software Science and Technology). The local 
organization team consisted of Panagiotis Katsaros (general chair), Ioannis Stamelos, 
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Lefteris Angelis, George Rahonis, Nick Bassiliades, Alexander Chatzigeorgiou, Ezio 
Bartocci, Simon Bliudze, Emmanouela Stachtiari, Kyriakos Georgiadis, and Petros 
Stratis (EasyConferences). 

The overall planning for ETAPS is the main responsibility of the Steering Com- 
mittee, and in particular of its Executive Board. The ETAPS Steering Committee 
consists of an Executive Board and representatives of the individual ETAPS confer- 
ences, as well as representatives of EATCS, EAPLS, and EASST. The Executive 
Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbriicken), Joost-Pieter 
Katoen (chair, Aachen and Twente), Gerald Liittgen (Bamberg), Vladimiro Sassone 
(Southampton), Tarmo Uustalu (Tallinn), and Lenore Zuck (Chicago). Other members 
of the Steering Committee are: Wil van der Aalst (Aachen), Parosh Abdulla (Uppsala), 
Amal Ahmed (Boston), Christel Baier (Dresden), Lujo Bauer (Pittsburgh), Dirk Beyer 
(Munich), Mikolaj Bojanczyk (Warsaw), Luis Caires (Lisbon), Jurriaan Hage 
(Utrecht), Rainer Hahnle (Darmstadt), Reiko Heckel (Leicester), Marieke Huisman 
(Twente), Panagiotis Katsaros (Thessaloniki), Ralf Kiisters (Stuttgart), Ugo Dal Lago 
(Bologna), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria 
(Limerick), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), 
Andrew M. Pitts (Cambridge), Alessandra Russo (London), Dave Sands (Göteborg), 
Don Sannella (Edinburgh), Andy Schiirr (Darmstadt), Alex Simpson (Ljubljana), 
Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas 
Vojnar (Brno), and Lijun Zhang (Beijing). 

I would like to take this opportunity to thank all speakers, attendees, organizers 
of the satellite workshops, and Springer for their support. I hope you all enjoy the 
proceedings of ETAPS 2018. Finally, a big thanks to Panagiotis and his local orga- 
nization team for all their enormous efforts that led to a fantastic ETAPS in 
Thessaloniki! 


February 2018 Joost-Pieter Katoen 


Preface 


TACAS 2018 is the 24th edition of the International Conference on Tools and 
Algorithms for the Construction and Analysis of Systems conference series. TACAS 
2018 is part of the 21st European Joint Conferences on Theory and Practice of Soft- 
ware (ETAPS 2018). The conference is held in the hotel Makedonia Palace in Thes- 
saloniki, Greece, during April 16-19, 2018. 

Conference Description. TACAS is a forum for researchers, developers, and users 
interested in rigorously based tools and algorithms for the construction and analysis of 
systems. The conference aims to bridge the gaps between different communities with 
this common interest and to support them in their quest to improve the utility, relia- 
bility, flexibility, and efficiency of tools and algorithms for building systems. TACAS 
solicits five types of submissions: 


— Research papers, identifying and justifying a principled advance to the theoretical 
foundations for the construction and analysis of systems, where applicable sup- 
ported by experimental validation 

— Case-study papers, reporting on case studies and providing information about the 
system being studied, the goals of the study, the challenges the system poses to 
automated analysis, research methodologies and approaches used, the degree to 
which goals were attained, and how the results can be generalized to other problems 
and domains 

— Regular tool papers, presenting a new tool, a new tool component, or novel 
extensions to an existing tool, with an emphasis on design and implementation 
concerns, including software architecture and core data structures, practical appli- 
cability, and experimental evaluations 

— Tool-demonstration papers (6 pages), focusing on the usage aspects of tools 

— Competition-contribution papers (4 pages), focusing on describing software- 
verification systems that participated at the International Competition on Software 
Verification (SV-COMP), which has been affiliated with our conference since 
TACAS 2012 


New Items in the Call for Papers. There were three new items in the call for papers, 
which we briefly discuss. 


— Focus on Replicability of Research Results. We consider that reproducibility of 
results is of the utmost importance for the TACAS community. Therefore, we 
encouraged all authors of submitted papers to include support for replicating the 
results of their papers. 

— Limit of 3 Submissions. A change of the TACAS bylaws requires that each indi- 
vidual author is limited to a maximum of three submissions as an author or 
co-author. Authors of co-authored submissions are jointly responsible for respecting 
this policy. In case of violations, all submissions of this (co-)author would be 
desk-rejected. 
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— Artifact Evaluation. For the first time, TACAS 2018 included an optional artifact 
evaluation (AE) process for accepted papers. An artifact is any additional material 
(software, data sets, machine-checkable proofs, etc.) that substantiates the claims 
made in a paper and ideally makes them fully replicable. The evaluation and 
archival of artifacts improves replicability and traceability for the benefit of future 
research and the broader TACAS community. 


Paper Selection. This year, 154 papers were submitted to TACAS, among which 
115 were research papers, 6 case-study papers, 26 regular tool papers, and 7 were 
tool-demonstration papers. After a rigorous review process, with each paper reviewed 
by at least 3 program committee (PC) members, followed by an online discussion, the 
PC accepted 35 research papers, 2 case-study papers, 6 regular tool papers, and 2 
tool-demonstration papers (45 papers in total). 

Competition on Software Verification (SV-COMP). TACAS 2018 also hosted the 
7th International Competition on Software Verification (SV-COMP), chaired and 
organized by Tomas Vojnar. The competition again had a high participation: 21 ver- 
ification systems with developers from 11 countries were submitted for the systematic 
comparative evaluation, including two submissions from industry. This volume 
includes short papers describing 9 of the participating verification systems. These 
papers were reviewed by a separate program committee (PC); each of the papers was 
assessed by four reviewers. One session in the TACAS program was reserved for the 
presentation of the results: the summary by the SV-COMP chair and the participating 
tools by the developer teams. 

Artifact-Evaluation Process. The authors of each of the 45 accepted papers were 
invited to submit an artifact immediately after the acceptance notification. An artifact 
evaluation committee (AEC), chaired by Arnd Hartmanns and Philipp Wendler, 
reviewed these artifacts, with 2 reviewers assigned to each artifact. The AEC received 
33 artifact submissions, of which 24 were successfully evaluated (73% acceptance rate) 
and have been awarded the TACAS AEC badge, which is added to the title page of the 
respective paper. The AEC used a two-phase reviewing process: Reviewers first per- 
formed an initial check of whether the artifact was technically usable and whether the 
accompanying instructions were consistent, followed by a full evaluation of the artifact. 
In addition to the textual reviews, reviews also provided scores for consistency, 
completeness, and documentation. The main criterion for artifact acceptance was 
consistency with the paper, with completeness and documentation being handled in a 
more lenient manner as long as the artifact was useful overall. Finally, TACAS pro- 
vided authors of all submitted artifacts the possibility to publish and permanently 
archive a “camera-ready” version of their artifact on https://springernature.figshare. 
com/tacas, with the only requirement being an open license assigned to the artifact. 
This possibility was used for 20 artifacts, while 2 more artifacts were archived inde- 
pendently by the authors. 

Acknowledgments. We would like to thank all the people who helped to make 
TACAS 2018 successful. First, the chairs would like to thank the authors for sub- 
mitting their papers to TACAS 2018. The reviewers did a great job in reviewing 
papers: They contributed informed and detailed reports and took part in the discussions 
during the virtual PC meeting. We also thank the steering committee for their advice. 


Preface Ix 


Special thanks go to the general chair, Panagiotis Katsaros, and his overall organization 
team, to the chair of the ETAPS 2018 executive board, Joost-Pieter Katoen, who took 
care of the overall organization of ETAPS, to the EasyConference team for the local 
organization, and to the publication team at Springer for solving all the extra problems 
that our introduction of the new artifact-evaluation process caused. 
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Abstract. Workflow graphs extend classical flow charts with concur- 
rent fork and join nodes. They constitute the core of business processing 
languages such as BPMN or UML Activity Diagrams. The activities of a 
workflow graph are executed by humans or machines, generically called 
resources. If concurrent activities cannot be executed in parallel by lack 
of resources, the time needed to execute the workflow increases. We study 
the problem of computing the minimal number of resources necessary to 
fully exploit the concurrency of a given workflow, and execute it as fast 
as possible (i.e., as fast as with unlimited resources). 

We model this problem using free-choice Petri nets, which are known 
to be equivalent to workflow graphs. We analyze the computational com- 
plexity of two versions of the problem: computing the resource and con- 
currency thresholds. We use the results to design an algorithm to approx- 
imate the concurrency threshold, and evaluate it on a benchmark suite of 
642 industrial examples. We show that it performs very well in practice: 
It always provides the exact value, and never takes more than 30 ms for 
any workflow, even for those with a huge number of reachable markings. 


1 Introduction 


A workflow graph is a classical control-flow graph (or flow chart) extended with 
concurrent fork and join. Workflow graphs represent the core of workflow lan- 
guages such as BPMN (Business Process Model and Notation), EPC (Event- 
driven Process Chain), or UML Activity Diagrams. 

In many applications, the activities of an execution workflow graph have to 
be carried out by a fixed number of resources (for example, a fixed number of 
computer cores). Increasing the number of cores can reduce the minimal runtime 
of the workflow. For example, consider a simple deterministic workflow (a work- 
flow without choice or merge nodes), which forks into k parallel activities, all of 
duration 1, and terminates after a join. With an optimal assignment of resources 
to activities, the workflow takes time k when executed with one resource, time 
[k/2] with two resources, and time 1 with k resources; additional resources 
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bring no further reduction. We call k the resource threshold. In a deterministic 
workflow that forks into two parallel chains of k sequential activities each, one 
resource leads to runtime 2k, and two resources to runtime k. More resources do 
not improve the runtime, and so the resource threshold is 2. Clearly, the resource 
threshold of a deterministic workflow with k activities is a number between 1 
and k. Determining this number can be seen as a scheduling problem. However, 
most scheduling problems assume a fixed number of resources and study how to 
optimize the makespan [11,17], while we study how to minimize the number of 
resources. Other works on resource/machine minimization [5,6] consider interval 
constraints instead of the partial-order constraints given by a workflow graph. 


i th O-E te Ps i tı Om | te Ps 
CET Ma D: : 
o5 PT -Ô OO 


P2 t3 pP p2 t3 
O; > È >G) * Ò 
p7 ty p9 Pr t7 ps 
(a) Sound free-choice workflow net N (b) A run of N 


Fig. 1. A sound free-choice workflow net and one of its runs (Color figure online) 


Following previous work, we do not directly work with workflow graphs, but 
with their equivalent representation as free-choice workflow Petri nets, which has 
been shown to be essentially the same model [10] and allows us to directly use 
a wealth of results of free-choice Petri nets [7]. Figure 1(a) shows a free-choice 
workflow net. The actual workflow activities, also called tasks, which need a 
resource to execute and which consume time are modeled as the places of the 
net: Each place p of the net is assigned a time r(p), depicted in blue. Intuitively, 
when a token arrives in p, it must execute a task that takes r(p) time units before 
it can be used to fire a transition. A free choice exists between transitions t4 and 
te, which is a representation of a choice node (if-then-else or loop condition) in 
the workflow. 

If no choice is present or all choices are resolved, we have a deterministic 
workflow such as the one in Fig. 1(b). In Petri net terminology, deterministic 
workflows correspond to the class of marked graphs. Deterministic workflows are 
common in practice: in the standard suite of 642 industrial workflows that we use 
for experiments, 63.7% are deterministic. We show that already for this restricted 
class, deciding if the threshold exceeds a given bound is NP-hard. Therefore, we 
investigate an over-approximation of the resource threshold, already introduced 
in [4]: the concurrency threshold. This is the maximal number of task places that 
can be simultaneously marked at a reachable marking. Clearly, if a workflow with 
concurrency threshold k is executed with k resources, then we can always start 
the task of a place immediately after a token arrives, and this schedule already 
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achieves the fastest runtime achievable with unlimited resources. We show that 
the concurrency threshold can be computed in polynomial time for deterministic 
workflows. 

For workflows with nondeterministic choice, corresponding to free-choice 
nets, we show that computing the concurrency threshold of free-choice workflow 
nets is NP-hard, solving a problem left open in [4]. We even prove that the prob- 
lem remains NP-hard for sound free-choice workflows. Soundness is the dominant 
behavioral correctness notion for workflows, which rules out basic control-flow 
errors such as deadlocks. NP-hardness in the sound case is remarkable, because 
many analysis problems that have high complexity in the unsound case can be 
solved in polynomial time in the sound case (see e.g. [1,7,8]). 

After our complexity analysis, we design an algorithm to compute bounds 
on the concurrency threshold using a combination of linear optimization and 
state-space exploration. We evaluate it on a benchmark suite of 642 sound free- 
choice workflow nets from an industrial source (IBM) [9]. The bounds can be 
computed in a total of 7s (over all 642 nets). On the contrary, the computation 
of the exact value by state-space exploration techniques times out for the three 
largest nets, and takes 7 min for the rest. (Observe that partial-order reduction 
techniques cannot be used, because one may then miss the interleaving realizing 
the concurrency threshold.) 

The paper is structured as follows. Section 2 contains preliminaries. Sections 3 
and 4 study the resource and concurrency thresholds, respectively. Section 5 
presents our algorithms for computing the concurrency bound, and experimental 
results. Finally, Sect.6 contains conclusions. 


2 Preliminaries 


Petri Nets. A Petri net N is a tuple (P,T, F) where P is a finite set of places, 
T is a finite set of transitions (P OT = Ø), and F C (Px T)U(T x P)isa 
set of arcs. The preset of x € PUT is °x © {y | (y, x) € F} and its postset is 
x° = {y | (x,y) € F}. We extend the definition of presets and postsets to sets of 
places and transitions X C PUT by °X = Unex °v and X° = Unex T°. A net 
is acyclic if the relation F* is a partial order, denoted by < and called the causal 
order. A node = of an acyclic net is causally maximal if no node y satisfies x < y. 

A marking of a Petri net is a function M : P — N, representing the number of 
tokens in each place. For a set of places S C P, we define M(S) = ves M(p). 
Further, for a set of places S C P, we define by Ms the marking with Ms(p) = 1 
for p € S and Mg(p) = 0 for p ¢ S. 

A transition t is enabled at a marking M if for all p € °t, we have M (p) > 1. 
If t is enabled at M, it may occur, leading to a marking M’ obtained by removing 
one token from each place of °t and then adding one token to each place of t°. 
We denote this by M +, M'. Let o = tıt2...tn be a sequence of transitions. 


For a marking Mo, o is an occurrence sequence if Mo = Mı Orpi Mha 
for some markings Mı, ..., Mn. We say that Mn is reachable from Mo by o and 
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denote this by Mo = Mn. The set of all markings reachable from M in N by some 
occurrence sequence g is denoted by RY (M). A system is a pair (N, M) of a Petri 
net N and a marking M. A system (N, M) is live if for every M’ € RN (M) and 
every transition t some marking M” € RN (M’) enables t. The system is 1-safe 
if M'(p) < 1 for every M’ € RN (M) and every place p € P. 


Convention: Throughout this paper we assume that systems are l-safe, i.e., we 
identify “system” and “l-safe system”. 


Net Classes. A net N = (P,T, F) is a marked graph if |*p| < 1 and |p*| < 1 for 
every place p € P, and a free-choice net if for any two places p1, p2 € P either 
p? N p3 = 0 or p? = p3. 


Non-sequential Processes of Petri Nets. An (A, B)-labeled Petri net is a 
tuple N = (P,T, F, à, p), where à: P > A and u: T > B are labeling functions 
over alphabets A, B. The nonsequential processes of a 1-safe system (N, M) are 
acyclic, (P,T)-labeled marked graphs. Say that a set P” of places of a (P,T)- 
labeled acyclic net enables t € T if all the places of P” are causally maximal, 
carry pairwise distinct labels, and \(P”) = °t. 


Definition 1. Let N = (P,T,F) be a Petri net and let M be a marking of N. 
The set NP(N,M) of nonsequential processes of (N, M) (processes for short) 
is the set of (P,T)-labeled Petri nets defined inductively as follows: 


- The (P,T)-labeled Petri net containing for each place p € P marked at M one 
place p labeled by p, no other places, and no transitions, belongs to NP(N, M). 
- If H = (P',T',F', A, u) € NP(N, M) and P” C P’ enables some transition t 
of N, then the (P,T)-labeled net II, = (P' 9 P, T' w {f}, FWP, AWA, wf), 
where Pi 
e P={p| pet}, with AP) =p, and fi(t) =t; 
e F={(p",t) | p” € P'}U{(EP) | Be P}; 
also belongs to NP(N, M). We say that I, extends IT. 


We denote the minimal and maximal places of a process IT w.r.t. the causal 
order by min(IT) and max(J1), respectively. 


As usual, we say that two processes are isomorphic if they are the same up to 
renaming of the places and transitions (notice that we rename only the names 
of the places and transitions, not their labels). 

Figure 2 shows two processes of the workflow net in Fig. 1(a). (The figure 
does not show the names of places and transitions, only their labels.) The net 
containing the white and grey nodes only is already a process, and the grey 
places are causally maximal places that enable tg. Therefore, according to the 
definition we can extend the process with the green nodes to produce another 
process. On the right we extend the same process in a different way, with the 
transition t4. 
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Fig. 2. Nonsequential processes of the net of Fig. 1(a) (Color figure online) 


The following is well known. Let (P’,T’, F’, A, u) be a process of (N, M): 


— For every linearization o = t,...t/, of T’ respecting the causal order <, the 
sequence (o) = p(t)... u(t) is a firing sequence of (N, M). Further, all 
these firing sequences lead to the same marking. We call it the final marking 
of IT, and say that I leads from M to its final marking. 

For example, in Fig. 2 the sequences of the right process labeled by tıtat3t4 
and tit3tat4 are firing sequences leading to the marking M = {po, ps, p7}. 

— For every firing sequence tı ---t, of (N, M) there is a process (P’, T’, F’, A, p) 
such that T’ = {t),...,t,}, w(t) = ti for every 1 < i < n, and p(t;) < u(t) 
implies i < j. 


Workflow Nets. We slightly generalize the definition of workflow net as pre- 
sented in e.g. [1] by allowing multiple initial and final places. A workflow net is 
a Petri net with two distinguished sets J and O of input places and output places 
such that (a) °I = @ = O° and (b) for all x € PUT, there exists a path from 
some i € I to some o € O passing through x. The markings Mr and Mo are 
called initial and final markings of N. A workflow net N is sound if 


- YM € RN (Mr): Mo € RN (M), 


~ YM € RN (Mr) : (M(O) > |O|) > (M = Mo), and 
- VteT:4M ERN (Mr): tis enabled at M. 


It is well-known that every sound free-choice workflow net is a 1-safe system with 
the initial marking My [2,7]. Given a workflow net according to this definition 
one can construct another one with one single input place 7 and output place o 
and two transitions t;,t, with °t; = {i},t? = I and °t, = O,t§ = {o}. For all 
purposes of this paper these two workflow nets are equivalent. 

Given a workflow net N, we say that a process IT of (N, Mr) is a run if it 
leads to Mo. For example, the net in Fig. 1(b) is a run of the net in Fig. 1(a). 


Petri Nets with Task Durations. We consider Petri nets in which, intuitively, 
when a token arrives in a place p it has to execute a task taking 7(p) time units 
before the token can be used to fire any transition. Formally, we consider tuples 
N = (P,T, F,r) where (P,T, F) is a net and 7: PON. 
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Definition 2. Given a nonsequential process IT = (P',T', F', A, u) of (N, M), 
a time bound t, and a number of resources k, we say that IT is executable within 
time t with k resources if there is a function f: P’ +N such that 


(1) for every pi, po E€ P': if pi < ph then f(p1) + T(A(1)) < fwd); 

(2) for every p' € P’: f(p') + 7(AW@’)) < t; and 

(3) for every0O < u < t there are at most k places p' € P' such that f(p') < u < 
F) +7(p'). 


We call a function f satisfying (1) a schedule, a function satisfying (1) and (2) 
a t -schedule, and a function satsifying (1)—(3) a (k, t)-schedule of H. 


Intuitively, f(p’) describes the starting time of the task executed at p’. Condition 
(1) states that if p} < ph, then the task associated to p} can only start after the 
task for p has ended; condition (2) states that all tasks are done by time t, and 
condition (3) that at any moment in time at most k tasks are being executed. 
As an example, the process in Fig. 1(b) can be executed with two resources in 
time 6 with the schedule 7, p1, p2 > 0; p3, p4 => 1; p7, pe — 3, and pg, pg |> 4. 

Given a process IT = (P',T',F', A, u) of (N,M) we define the schedule 
fmin as follows: if p' € min(JZ) then fnin(p’) = 0, otherwise define fimin(p’) = 
max{ fmin(p”) + T(A(p”)) | p” < p’}. Further, we define the minimal execution 
time tmin( II) = max{f(p')+7(A(p")) | p' E€ max(JZ)}. In the process in Fig. 1(b), 
the schedule fmin is the function that assigns 7,p1,p2,p7 t? 0, p3,pa => 1, 
Pe, Ps | 3, pọ > 4,ando + 6, and so tmin( l) = 6. We have: 


Lemma 1. A process IT = (P',T', F', A, u) of (N,M) can be executed within 
time tmin( TI) with |P’| resources, and cannot be executed faster with any number 
of resources. 


Proof. For k > |P’| resources condition (3) of Definition 2 holds vacuously. IT 
is executable within time t iff conditions (1) and (2) hold. Since fmin satisfies 
(1) and (2) for t = tmin(JZ), H can be executed within time tmin(JZ). Further, 
tmin(JZ) is the smallest time for which (1) and (2) can hold, and so J cannot be 
executed faster with any number of resources. 


3 Resource Threshold 


We define the resource threshold of a run of a workflow net, and of the net itself. 
Intuitively, the resource threshold of a run is the minimal number of resources 
that allows one to execute it as fast as with unlimited resources, and the resource 
threshold of a workflow net is the minimal number of resources that allows one 
to execute every run as fast as with unlimited resources. 


Definition 3. Let N be a workflow net, and let IT be a run of N. The resource 
threshold of JI, denoted by RT (II) is the smallest number k such that IT can be 
executed in time tmin( TI) with k resources. A schedule of IT realizes the resource 
threshold if it is a (RT (IL), tmin(IZ))-schedule. 
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The resource threshold of N, denoted by RT(N), is defined by RT(N) = 
max{RT (IZ) | H is a run of (N, Mr)}. A schedule of N is a function that assigns 
to every process IT € NP(N,M) a schedule of IT. A schedule of N is a (k,t)- 
schedule if it assigns to every run IT a (k,t)-schedule of H. A schedule of N 
realizes the resource threshold if it assigns to every run IT a (RT(N), tmin(JZ))- 
schedule. 


Example 1. We have seen in the previous section that for the process in Fig. 1(b) 
we have tmin(J7) = 6, and a schedule with two resources already achieves this 
time. So the resource bound is 2. The workflow net of Fig. 1 has infinitely many 
runs, in which loosely speaking, the net executes t4 arbitrarily many times, until 
it “exits the loop” by choosing te, followed by ty and tg. It can be shown that all 
processes have resource threshold 2, and so that is also the resource threshold 
of the net. 


In the rest of the section we obtain two negative results about the result 
threshold. First, it is difficult to compute: Determining if the resource threshold 
exceeds a given threshold is NP-complete even for acyclic marked graphs, a 
very simple class of workflows. Second, we show that even for acyclic free-choice 
workflow nets the resource threshold may not be realized by any online scheduler. 


3.1 Resource Threshold Is NP-complete for Acyclic Marked Graphs 


We prove that deciding if the resource threshold exceeds a given bound is NP- 
complete even for acyclic sound marked graphs. The proof proceeds by reduction 
from the following classical scheduling problem, proved NP-complete in [18]: 


Given: a finite, partially ordered set of jobs with non-negative integer 
durations, and non-negative integers t and k. 
Decide: Can all jobs can be executed with k machines within t time units 
in a way that respects the given partial order, i.e., a job is started only 
after all its predecessors have been finished? 


More formally, the problem is defined as follows: Given jobs J = {Ji,...,Jn}, 

where J; has duration r(J;) for every 1 < i < n, and a partial order < on J, 

does there exist a function f: J — N such that 

(1) for every 1 < i,j < n: if Ji < J; then f(Ji)+7( Ji) < F(J;); 

(2) for every 1<i<n: f(Jj)+7(Ji) < t; and 

(3) for every 0 < u < t there are at most k indices i such that f(Ji) < u < 
These conditions are almost identical to the ones we used to define if a nonse- 

quential process can be executed within time t with k resources. We exploit this 

to construct an acyclic workflow marked graph that “simulates” the scheduling 

problem. For the detailed proof, we refer to the full version of this paper [15]. 


Theorem 1. The following problem is NP-complete: 


Given: An acyclic, sound workflow marked graph N, and a number k. 
Decide: Does RT(N) < k hold? 
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3.2 Acyclic Free-Choice Workflow Nets May Have no Optimal 
Online Schedulers 


A resource threshold of k guarantees that every run can be executed without 
penalty with k resources. In other words, there exists a schedule that achieves 
optimal runtime. However, in many applications the schedule must be deter- 
mined at runtime, that is, the resources must be allocated without knowing how 
choices will be resolved in the future. In order to formalize this idea we define 
the notion of an online schedule of a workflow net N. 


Definition 4. Let N be a Petri net, and let IT and II’ be two processes of 
(N,M). We say that IT is a prefix of II’, denoted by IT < Il’, if there is a 
sequence IT,,..., I, of processes such that I, = I, I, = II’, and Hipi extends 
IT; by one transition for everyl <i<n-1. 

Let f be a schedule of (N, M), i.e., a function assigning a schedule to each 
process. We say that f is an online schedule if for every two runs IN), Ho, and 
for every two prefixes IT, < I and II, < Mo: If I, and If are isomorphic, then 
f(T) = f(T). 


Intuitively, if M and M4 are isomorphic then they are the same process JI, 
which in the future can be extended to either Iı or Mə, depending on which 
transitions occur. In an online schedule, JI is scheduled in the same way, inde- 
pendently of whether it will become J or [Tz in the future. We show that even 
for acyclic free-choice workflow nets there may be no online schedule that realizes 
the resource threshold. That is, even though for every run it is possible to sched- 
ule the tasks with RT(N) resources to achieve optimal runtime, this requires 
knowing how it will evolve before the execution of the workflow. 


Proposition 1. There is an acyclic, sound free-choice workflow net for which 
no online schedule realizes the resource threshold. 


Fig. 3. A workflow net with two runs. No online scheduler for three resources achieves 
the minimal runtime in both runs. (Color figure online) 
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Proof. Consider the sound free-choice workflow net (N, Mr) of Fig. 3. It has two 
runs: J4, which executes the grey and green transitions, and J/,, which executes 
the grey and red transitions. Their resource thresholds are RT (J7,) = RT(JI,) = 
3, realized by the schedules fọ and fr in Fig. 4: 


0 1 2 3 4 5 0 1 2 3 4 5 
resource 1 Pa P4 
resource 2 P3 P5 pı P3 Pe 
resource 3 Pı || P2 Ps p2 Ps P7 


Fig. 4. Schedules f, and fr for the two runs Hg and H, of the net of Fig. 3. 


Indeed, observe that fọ and fr execute I, and H, within time 5, and even 
with unlimited resources no schedule can be faster because of the task p4, while 
two or fewer resources are insufficient to execute either run within time 5. 

The schedule of (N, Mr) that assigns f, and f, to I, and II, is not an online 
schedule. Indeed, the process containing one single transition labeled by tı and 
places labeled by i, p1, p2, p3 is isomorphic to prefixes of Hg and H,. However, 
we have f,(p3) =0 #1 = f,(p3). We now claim: 


(a) Every schedule f, of I, that realizes the resource threshold (time 5 with 3 
resources) satisfies f,(p3) = 0. 
Indeed, if fg(p3) > 1, then fy(ps) > 3, fg(po) > 6, and finally f,(0) > 6, 
so fg does not meet the time bound. 
(b) Every schedule f, of IM, that realizes the resource threshold (time 5 with 3 
resources) satisfies f,(p3) > 0. 
Observe first that we necessarily have f,(p4) = 0, and so a resource, say 
Rı, is bound to p4 during the complete execution of the workflow, leaving 
two resources left. Assume f,(p3) = 0, i.e., a second resource, say Ro, is 
bound to p3 at time 0, leaving one resource left, say R3. Since both pı and 
p2 must be executed before pg, and only Rg is free until time 2, we get 
fr(ps) > 2. So at time 2 we still have to execute pg,p7,ps with resources 
Ro, R3. Therefore, two out of pe, p7,ps must be executed sequentially by 
the same resource. Since pg,p7,ps take 2 time units each, one of the two 
resources needs time 4, and we get f,(o) > 6. 


By this claim, at time 0, an online schedule has to decide whether to allocate a 
resource to p3 or not, without knowing which of t3 or t4 will be executed in the 
future. If it schedules f(p3) = 0 and later t4 occurs, then I, is executed and the 
deadline of 5 time units is not met. The same occurs if it schedules f(p3) > 0, 
and later t3 occurs. 
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4 Concurrency Threshold 


Due to the two negative results presented in the previous section, we study a 
different parameter, introduced in [4], called the concurrency threshold. During 
execution of a business process, information on the resolution of future choices 
is often not available, and further no information on the possible duration of a 
task (or only weak bounds) are known. Therefore, the scheduling is performed in 
practice by assigning a resource to a task at the moment some resource becomes 
available. The question is: What is the minimal number of resources needed to 
guarantee the optimal execution time achievable with an unlimited number of 
resources? 

The answer is simple: since there is no information about the duration of 

tasks, every reachable marking of the workflow net without durations may be 
also reached for some assignment of durations. Let M be a reachable marking 
with a maximal number of tokens, say k, in places with positive duration, and 
let dy < dg < --- < dp be the durations of their associated tasks. If less than k 
resources are available, and we do not assign a resource to the task with duration 
dx, we introduce a delay with respect to the case of an unlimited number of 
resources. On the contrary, if the number of available resources is k, then the 
scheduler for k resources can always simulate the behaviour of the scheduler for 
an unlimited number of resources. 
Definition 5. Let N = (P,T,F,I,O,r) be a workflow Petri net. For every 
marking M of N, define the concurrency of M as conc(M) 2 X pep M(p), 
where D C P is the set of places p € P such that T(p) > 0. The concurrency 
threshold ofN is defined by 


def 


CT(N) = max {conc(M) | Me RN (M)}. 
The following lemma follows easily from the definitions. 
Lemma 2. For every workflow net N: RT(N) < CT(N). 


Proof. Follows immediately from the fact that for every schedule f of a run of 
N, there is a schedule g with CT(N) machines such that g(p) < f(p) for every 
place p. 


In the rest of the paper we study the complexity of computing the concur- 
rency threshold. In [4], it was shown that the threshold can be computed in 
polynomial time for regular workflows, a class with a very specific structure, and 
the problem for the general free-choice case was left open. In Sect. 4.1 we prove 
that the concurrency threshold of marked graphs can be computed in polynomial 
time by reduction to a linear programming problem over the rational numbers. 
In Sect. 4.2 we study the free-choice case. We show that deciding if the thresh- 
old exceeds a given value is NP-complete for acyclic, sound free-choice workflow 
nets. Further, it can be computed by solving the same linear programming prob- 
lem as in the case of marked graphs, but over the integers. Finally, we show 
that in the cyclic case the problem remains NP-complete, but the integer linear 
programming problem does not necessarily yield the correct solution. 
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4.1 Concurrency Threshold of Marked Graphs 


The concurrency threshold of marked graphs can be computed using a standard 
technique based on the marking equation [16]. Given a net N = (P,T, F), define 
the incidence matrix of N as the |P| x |T| matrix N given by: 


1 ifpet*\%t 
N(p,t)= 4 —1 ifpe*t\t 
0 otherwise 


In the following, we denote by M the representation of a marking M as a 
vector of dimension |P|. Let N be a Petri net, and let M1, M2 be markings of 
N. The following results are well known from the literature (see e.g. [16]): 


— If Mg is reachable from Mj, in N, then Mz = M,+N.-X for some integer 
vector X > 0. 

— If N is a marked graph and Mz = Mı + N - X for some rational vector 
X > 0, then M is reachable from M, in N. 

— If N is acyclic and Mz = Mı + N - X for some integer vector X > 0, then 
Mg is reachable from Mi in N. 


Given a workflow net N = (P,T, F,I,O,7), let D: P+ N be the vector defined 
by D(p) = 1 if p € D and D(p) = 0 if p ¢ D, where D is the set of places with 
positive duration. We define the linear optimization problem 


e =max{D-M|M=M,+N-X,M>0,X >0} (1) 


Since the solutions of M = Mr + N- X contain all the reachable markings of 
(N, Mr), we have 0’ > CT(N). Further, using these results above, we obtain: 


Theorem 2. Let N be a workflow net, and let lg and "Aj be the solution of 
the linear optimization problem (1) over the rationals and over the integers, 
respectively. We have: 


- lg > > CT(N); 
- If N is a marked graph, then lg = lz = CT(N). 
- If N is acyclic, then lg > lz =CT(N). 


In particular, it follows that CT(N) can be computed in polynomial time for 
marked graphs, acyclic or not. (The result about acyclic nets is used in the next 
section. ) 


4.2 Concurrency Threshold of Free-Choice Nets 


We study the complexity of computing the concurrency threshold of free-choice 
workflow nets. We first show that, contrary to numerous other properties for 
which there are polynomial algorithms, deciding if the concurrency threshold 
exceeds a given value is NP-complete. 
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Theorem 3. The following problem is NP-complete: 


Given: A sound, free-choice workflow net N = (P,T,F,I,O), and a num- 
ber k < |T|. 
Decide: Is the concurrency threshold of N at least k? 


Proof. A detailed proof can be found in the full version of this paper [15], here 
we only sketch the argument. Membership in NP is nontrivial, and follows from 
results of [1,7]. We prove NP-hardness by means of a reduction from Maximum 
Independent Set (MIS): 


Given: An undirected graph G = (V, E), and a number k < |V]. 
Decide: Is there a set In C V such that |In| > k and {v, u} ¢ E for every 
u,v € In? 


Given a graph G = (V, E), we construct a sound free-choice workflow net Ne 
in polynomial time as follows: 


— For each e = {v, u} € E we add to Ne the “gadget net” Ne shown in Fig. 5(a), 
and for every node v we add the gadget net N, shown in Fig. 5(b). 

— For every e = {v,u} € FE, we add an arc from the place [e,v]* of Ne to the 
transition vt of N,, and from [e, u]* to the transition u! of Nu. 

— The set J of initial places contains the place e° of Ne for every edge e; the set 
O of output places contains the places v? of the nets Ny. 


3 4 
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Fig. 5. Gadgets for the proof of Theorem 3. 


It is easy to see that Ng is free-choice and sound, and in [15] we show the 
result of applying the reduction to a small graph and prove that G has an 
independent set of size at least k iff the concurrency threshold of (Ng, Mr) is at 
least 2|E| + k. The intuition is that for each edge e € E, we fire the transition 
[e, u]' where u ¢ In, and for each v € In, we fire the transition vt, thus marking 
one of [e, u]? or [e, v]? for each edge e € E and the place v? for each v € In. 
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4.3 Approximating the Concurrency Threshold 


Recall that the solution of problem (1) over the rationals or the integers is always 
an upper bound on the concurrency threshold for any Petri net (Theorem 2). 
The question is whether any stronger result holds when the workflows are sound 
and free-choice. Since computing the concurrency threshold is NP-complete, we 
cannot expect the solution over the rationals, which is computable in polynomial 
time, to provide the exact value. However, it could still be the case that the 
solution over the integers is always exact. Unfortunately, this is not true, and 
we can prove the following results: 


Theorem 4. Given a Petri net N, let et and € be as in Theorem 2. 


(a) There is an acyclic sound free-choice workflow net N such that CT(N) < a. 
(b) There is a sound free-choice workflow net N such that and let CT(N) < 0. 


Proof. For (a), we can take the net obtained by adding to the gadget in Fig. 5(a) 
a new transition with input places [e,v|* and [e, u]*, and an output place o with 
weight 2. We take e? as input place. The concurrency threshold is clearly 2, 
reached, for example, after firing [e, v]!. However, we have lg = 3, reached by 
the rational solution X = (1/2,1/2,...,1/2). Indeed, the marking equation then 
yields the marking M satisfying M (|e, v]?) = M ([e, u]?) = M(o) = 1/2. 

For (b), we can take the workflow net of Fig.6. It is easy to see that the 
concurrency threshold is equal to 1. The marking M that puts one token in each 
of the two places with weight 1, and no token in the rest of the places, is not 
reachable from Mr. However, it is a solution of the marking equation, even when 
solved over the integers. Indeed, we have M = M;+N-X for X=(1,0,1,1,0,0,1). 
Therefore, the upper bound derived from the marking equation is 2. 


t7 


Fig. 6. A sound free-choice workflow net for which the linear programming problem 
derived from the marking equation does not yield the exact value of the concurrency 
bound, even when solved over the integers. 
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5 Concurrency Threshold: A Practical Approach 


We have implemented a tool! to compute an upper bound on the concurrency 
threshold by constructing a linear program and solving it by calling the mixed- 
integer linear programming solver Cbc from the COIN-OR project [14]. Addi- 
tionally, fixing a number k, we used the state-of-the art Petri net model checker 
LoLA [19] to both establish a lower bound, by querying LoLA for existence of a 
reachable marking M with conc(M) > k; and to establish an upper bound, by 
querying LoLA if all reachable markings M’ satisfy conc(M") < k. 

We evaluated the tool on a set of 1386 workflow nets extracted from a collec- 
tion of five libraries of industrial business processes modeled in the IBM Web- 
Sphere Business Modeler [9]. For the concurrency threshold, we set D = P \ O. 
These nets also have multiple output places, however with a slightly different 
semantics for soundness allowing unmarked output places in the final marking. 
We applied the transformation described in [12] to ensure all output places will 
be marked in the final marking. This transformation preserves soundness and 
the concurrency threshold. 

All of the 1386 nets in the benchmark libraries are free-choice nets. We 
selected the sound nets among them, which are 642. Out of those 642 nets, 409 
are marked graphs. Out of the remaining 233 nets, 193 are acyclic and 40 cyclic. 
We determined the exact concurrency threshold of all sound nets with LoLA 
using state-space exploration. Figure 7 shows the distribution of the threshold. 


250 
200 


Number of nets 
= 
(= 
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1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 20 26 29 33 66 


Concurrency threshold 


Fig. 7. Distribution of the concurrency threshold of the 642 nets analyzed. 


On all 642 sound nets, we computed an upper bound on the concurrency 
threshold using our tool, both using rational and integer variables. We com- 
puted lower and upper bounds using LoLA with the value k = CT(N) of the 
concurrency threshold. We report the results for computing the lower and upper 
bound separately. 

All experiments were performed on the same machine equipped with an Intel 
Core i7-6700K CPU and 32GB of RAM. The results are shown in Table1. 


1 The tool is available from https://gitlab.Irz.de/i7/macaw. 
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Using the linear program, we were able to compute an upper bound for all 
nets in total in less than 7s, taking at most 30ms for any single net. LoLA 
could compute the lower bound for all nets in 6s LoLA fails to compute the 
upper bound in three cases due to reaching the memory limit of 32 GB. For the 
remaining 639 nets, LoLA could compute the upper bound within 7 min in total. 

We give a detailed analysis for the 9 nets with a state space of over one 
million. For three nets with state space of sizes 109, 101° and 1017, LoLa reaches 
the memory limit. For four nets with state spaces between 10° and 108 and 
concurrency threshold above 25, LoLA takes 2, 10, 48 and 308s each. For two 
nets with a state space of 108 and a concurrency threshold of just 11, LoLA can 
establish the upper bound in at most 20 ms. The solution of the linear program 
can be computed in all 9 cases in less than 30 ms. 


Table 1. Statistics on the size and analyis time for the 642 nets analyzed. The times 
marked with * exclude the 3 nets where LoLA reaches the memory limit. 


Net size Analysis time (sec) 

|P] ITI |R| |CTUN) |e |@ | CT(N) > k|CT(N) <k 
Median}21 14 | 16 3 0.01 | 0.01 | 0.01 0.01 
Mean | 28.4 18.6 3-10" |3.7 0.01 | 0.01 | 0.01 0.58* 
Max 262 284 2-10" | 66 0.03 | 0.03 | 1.18 307.76* 


Comparing the values of the upper bound, first we observed that we obtained 
the same value using either rational or integer variables. The time difference 
between both was however negligible. Second, quite surprisingly, we noticed that 
the upper bound obtained from the linear program is exact in all of our cases, 
even for the cyclic ones. Further, it can be computed much faster in several 
cases than the upper bound obtained by LoLA and it gives a bound in all cases, 
even when the state-space exploration reaches its limit. By combining linear 
programming for the upper bound and state-space exploration for the lower 
bound, an exact bound can always be computed within a few seconds. 


6 Conclusion 


Planning sufficient execution resources for a business or production process is 
a crucial part of process engineering [3,13,20]. We considered a simple version 
of this problem in which resources are uniform and tasks are not interrupt- 
ible. We studied the complexity of computing the resource threshold, i.e., the 
minimal number of resources allowing an optimal makespan. We showed that 
deciding if the resource threshold exceeds a given bound is NP-hard even for 
acyclic marked graphs. For this reason, we investigated the complexity of com- 
puting the concurrency threshold, an upper bound of the resource threshold 
introduced in [4]. Solving a problem left open in [4], we showed that deciding if 
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the concurrency threshold exceeds a given bound is NP-hard for general sound 
free-choice workflow nets. We then presented a polynomial-time approximation 
algorithm, and showed experimentally that it computes the exact value of the 
concurrency threshold for all benchmarks of a standard suite of free-choice work- 
flow nets. 
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Abstract. We study the fine-grained complexity of Leader Contributor 
Reachability (LCR) and Bounded-Stage Reachability (BSR), two vari- 
ants of the safety verification problem for shared-memory concurrent 
programs. For both problems, the memory is a single variable over a 
finite data domain. We contribute new verification algorithms and lower 
bounds based on the Exponential Time Hypothesis (ETH) and kernels. 

LCR is the question whether a designated leader thread can reach an 
unsafe state when interacting with a certain number of equal contributor 
threads. We suggest two parameterizations: (1) By the size of the data 
domain D and the size of the leader L, and (2) by the size of the contrib- 
utors C. We present two algorithms, running in O*((L-(D+1))*?-D”) and 
O*(4°) time, showing that both parameterizations are fixed-parameter 
tractable. Further, we suggest a modification of the first algorithm suit- 
able for practical instances. The upper bounds are complemented by 
(matching) lower bounds based on ETH and kernels. 

For BSR, we consider programs involving t different threads. We 
restrict the analysis to computations where the write permission changes 
s times between the threads. BSR asks whether a given configuration 
is reachable via such an s-stage computation. When parameterized by 
P, the maximum size of a thread, and t, the interesting observation is 
that the problem has a large number of difficult instances. Formally, we 
show that there is no polynomial kernel, no compression algorithm that 
reduces D or s to a polynomial dependence on P and t. This indicates 
that symbolic methods may be harder to find for this problem. 

A full version of the paper is available as [9]. 


1 Introduction 


We study the fine-grained complexity of two safety verification problems [1, 16, 
27] for shared-memory concurrent programs. The motivation to reconsider these 
problems are recent developments in fine-grained complexity theory [6, 10,30, 33]. 
They suggest that classifications such as NP or even FPT are too coarse to explain 
the success of verification methods. Instead, it should be possible to identify the 
precise influence that parameters of the input have on the verification time. Our 
contribution confirms this idea. We give new verification algorithms for the two 
problems that, for the first time, can be proven optimal in the sense of fine- 
grained complexity theory. To state the results, we need some background. As 
we proceed, we explain the development of fine-grained complexity theory. 
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There is a well-known gap between the success that verification tools see 
in practice and the judgments about computational hardness that worst-case 
complexity is able to give. The applicability of verification tools steadily increases 
by tuning them towards industrial instances. The complexity estimation is stuck 
with considering the input size (or at best assumes certain parameters to be 
constant, which does not mean much if the runtime is then n*, where n is the 
input size and k the parameter). 

The observation of a gap between practical algorithms and complexity theory 
is not unique to verification but made in every field that has to solve computa- 
tionally hard problems. Complexity theory has taken up the challenge to close 
the gap. So-called fixed-parameter tractability (FPT) [11,13] proposes to identify 
parameters k so that the runtime is f(k)poly(n), where f is a computable func- 
tion. These parameters are powerful in the sense that they dominate the com- 
plexity. 

For an FPT result to be useful, function f should only be mildly exponential, 
and of course k should be small in the instances of interest. Intuitively, they are 
what one needs to optimize. Fine-grained complexity is the study of upper and 
lower bounds on function f. Indeed, the fine-grained complexity of a problem is 
written as O*(f(k)), emphasizing f and & and suppressing the polynomial part. 
For upper bounds, the approach is still to come up with an algorithm. 

For lower bounds, fine-grained complexity has taken a new and very prag- 
matic perspective. For the problem of n-variable 3-SAT the best known algo- 
rithm runs in 2”, and this bound has not been improved since 1970. The idea 
is to take improvements on this problem as unlikely, known as the exponential- 
time hypothesis (ETH) [30]. ETH serves as a lower bound that is reduced to 
other problems [33]. An even stronger assumption about n-variable SAT, called 
SETH [6,30], and a similar one about Set Cover [10] allow for lower bounds like 
the absence of (2 — £)” algorithms. 

In this work, we contribute fine-grained complexity results for verification 
problems on concurrent programs. The first problem is reachability for a leader 
thread that is interacting with an unbounded number of contributors (LCR) [16, 
27]. We show that, assuming a parameterization by the size of the leader L and 
the size of the data domain D, the problem can be solved in O*((L-(D+1))*?-D?). 
At the heart of the algorithm is a compression of computations into witnesses. To 
check reachability, our algorithm then iterates over candidates for witnesses and 
checks each of them for being a proper witness. Interestingly, we can formulate 
a variant of the algorithm that seems to be suited for large state spaces. 

Using ETH, we show that the algorithm is (almost) optimal. Moreover, the 
problem is shown to have a large number of hard instances. Technically, there is 
no polynomial kernel [4,5]. Experience with kernel lower bounds is still limited. 
This notion of hardness seems to indicate that symbolic methods are hard to 
apply to the problem. The lower bounds that we present share similarities with 
the reductions from [7,24, 25]. 
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If we consider the size of the contributors a parameter, we obtain a singly 
exponential upper bound that we also prove to be tight. The saturation-based 
technique that we use is inspired by thread-modular reasoning [20,21, 26,29]. 

The second problem we study generalizes bounded context switching. 
Bounded-stage reachability (BSR) asks whether a state is reachable if there is 
a bound s on the number of times the write permission is allowed to change 
between the threads [1]. Again, we show the new form of kernel lower bound. 
The result is tricky and highlights the power of the computation model. 

The results are summarized by the table below. Two findings stand out, 
we highlight them in gray. We present a new algorithm for LCR. Moreover, we 
suggest kernel lower bounds as hardness indicators for verification problems. The 
lower bound for BSR is particularly difficult to achieve. 


Problem |Upper Bound Lower Bound Kernel 

LCR(D,L) |O*((L- D+1))'?-D®) }2°¢VEP lost») No poly. 
LCR(c) |oO*(4°) 2006) No poly. 
BSR(P,t) |O*(P?*) go(t log(P)) No poly. 


Related Work. Concurrent programs communicating through a shared mem- 
ory and having a fixed number of threads have been extensively studied 
(2,14, 22,28]. The leader contributor reachability problem as considered in this 
paper was introduced as parametrized reachability in [27]. In [16], it was shown 
to be NP-complete when only finite-state programs are involved and PSPACE- 
complete for recursive programs. In [31], the parameterized pairwise-reachability 
problem was considered and shown to be decidable. Parameterized reachability 
under a variant of round-robin scheduling was proven decidable in [32]. 

The bounded-stage restriction on the computations of concurrent programs 
as considered here was introduced in [1]. The corresponding reachability problem 
was shown to be NP-complete when only finite-state programs are involved. The 
problem remains in NEXP-time and PSPACE-hard for a combination of counters 
and a single pushdown. The bounded-stage restriction generalizes the concept 
of bounded context switching from [34], which was shown to be NP-complete in 
that paper. In [8], FPT algorithms for bounded context switching were obtained 
under various parameterization. In [3], networks of pushdowns communicating 
through a shared memory were analyzed under various topological restrictions. 

There have been few efforts to obtain fixed-parameter-tractable algorithms 
for automata and verification-related problems. FPT algorithms for automata 
problems have been studied in [18,19,35]. In [12], model-checking problems for 
synchronized executions on parallel components were considered and proven 
intractable. In [15], the notion of conflict serializability was introduced for the 
TSO memory model and an FPT algorithm for checking serializability was pro- 
vided. The complexity of predicting atomicity violations on concurrent systems 
was considered in [17]. The finding is that FPT solutions are unlikely to exist. 
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2 Preliminaries 


We introduce our model for programs, which is fairly standard and taken from [1, 
16,27], and give the basics on fixed-parameter tractability. 


Programs. A program consists of finitely many threads that access a shared 
memory. The memory is modeled to hold a single value at a time. Formally, a 
(shared-memory) program is a tuple A = (D,a°, (Pi)iefi..4)- Here, D is the data 
domain of the memory and a? € D is the initial value. Threads are modeled as 
control-flow graphs that write values to or read values from the memory. These 
operations are captured by Op(D) = {!a,?a | a € D}. We use the notation 
W(D) = {!a | a € D} for the write operations and R(D) = {?a | a € D} 
for the read operations. A thread P,q is a non-deterministic finite automaton 
(Op(D),Q,¢q°,5) over the alphabet of operations. The set of states is Q with 
q? € Q the initial state. The final states will depend on the verification task. 
The transition relation is 6 C Q x (Op(D) U {e}) x Q. We extend it to words 
and also write q & q/ for q' € 6(q, w). Whenever we need to distinguish between 
different threads, we add indices and write Qia or ĝia- 

The semantics of a program is given in terms of labeled transitions between 
configurations. A configuration is a pair (pc,a) E€ (Qi x --- x Qi) x D. The 
program counter pc is a vector that shows the current state pc(i) € Qi of each 
thread P;. Moreover, the configuration gives the current value in memory. We 
call c° = (pe°, a?) with pc? (i) = q? for alli € [1..t] the initial configuration. Let C 
denote the set of all configurations. The transition relation among configurations 
— CC x (Op(D)U {e}) x C is obtained by lifting the transition relations of the 
threads. To define it, let pce, = pc|i = qi], meaning thread P; is in state q; and 
otherwise the program counter coincides with pc. Let pcs = pcli = qj]. If thread 
P, tries to read with the transition qi 2e q;, then (pc, a) 24 (pc, a). Note that 
the memory is required to hold the desired value. If the thread has the transition 
qi = q;, then (pe, a) = (pcp, b). Finally, q; = qj yields (pce,,a) — (peg, a). The 
program’s transition relation is generalized to words, c 2, d. We call such a 
sequence of consecutive labeled transitions a computation. To indicate that there 
is a word that justifies a computation from c to c’, we write c —* d. We may 
use an index —>; to indicate that the computation was induced by thread P;. 
Where appropriate, we also use the program as an index, ds 


Fixed-Parameter Tractability. We wish to study the fine-grained complexity 
of safety verification problems for the above programs. This means our goal is to 
identify parameters of these problems that have two properties. First, in practical 
instances they are small. Second, assuming that these parameters are small, show 
that there are efficient verification algorithms. Parametrized complexity makes 
precise the idea of an algorithm being efficient relative to a parameter. 

A parameterized problem L is a subset of X* x N. The problem is fixed- 
parameter tractable if there is a deterministic algorithm that, given (x, k) € X* xN, 
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decides (x,k) € Lin time f(k) - |2|°“. We use FPT for the class of all fixed- 
parameter-tractable problems and say a problem is FPT to mean it is in that class. 
Note that f is a computable function that only depends on the parameter k. It is 
common to denote the runtime by O*(f(k)) and suppress the polynomial part. We 
will be interested in the precise dependence on the parameter, in upper and lower 
bounds on the function f. This study is often referred to as fine-grained complexity. 

Lower bounds on f are obtained by the Exponential Time Hypothesis (ETH). 
It assumes that there is no algorithm solving n-variable 3-SAT in 2° time. The 
reasoning is as follows: If f dropped below a certain bound, ETH would fail. 

While many parameterizations of NP-hard problems were proven to be fixed- 
parameter tractable, there are problems that are unlikely to be FPT. Such prob- 
lems are hard for the complexity class W[1]. The appropriate notion of reduction 
for a theory of relative hardness in parameterized complexity is called parame- 
terized reduction. 


3 Leader Contributor Reachability 


We consider the leader contributor reachability problem for shared-memory pro- 
grams. The problem was introduced in [27] and shown to be NP-complete in 
[16] for the finite-state case.! We contribute two new verification algorithms 
that target two parameterizations of the problem. In both cases, our algorithms 
establish fixed-parameter tractability. Moreover, with matching lower bounds we 
prove them to be optimal even in the fine-grained sense. 

An instance of the leader contributor reachability problem is given by a 
shared-memory program of the form A = (D,a°, (Pr, (P,)iefi.1))). The program 
has a designated leader thread Pzr, and several contributor threads P,,...,P,. In 
addition, we are given a set of unsafe states for the leader. The task is to check 
whether the leader can reach an unsafe state when interacting with a number of 
instances of the contributors. It is worth noting that the problem can be reduced 
to having a single contributor. Let the corresponding thread Po be the union 
of P,,...,P, (constructed using an initial e-transition). We base our complexity 
analysis on this simplified formulation of the problem. 

For the definition, let A = (D, a?, (Pr, Pc)) be a program with two threads. 
Let Fr C Qz be a set of unsafe states of the leader. For t € N, define the program 
A = (D,a?, (Pr, (Po)icj..t)) to have t copies of Po. Further, let Cf be the set 
of configurations where the leader is in an unsafe state (from Fg). The problem 
of interest is as follows: 


Leader Contributor Reachability (LCR) 
Input: A program A = (D,a°,(Pr,Pc)) and a set of states Fr C Qz. 
Question: Is there a t € N such that c° —". ¢ for some c € Cf? 


1 The problem is called parameterized reachability in these works. We renamed it to 
avoid confusion with parameterized complexity. 
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We consider two parameterizations of LCR. First, we parameterize by D, the 
size of the data domain D, and L, the number of states of the leader Pz. We 
denote the parameterization by LCR(D,L). While for LCR(D,L) we obtain an 
FPT algorithm, it is not likely that LCR(D) and LCR(L) admit the same. These 
parameterizations are W[1]-hard. For details, we refer to the full version [9]. 

The second parameterization that we consider is LCR(C), a parameterization 
by the number of states of the contributor Po. We prove that the parameter is 
enough to obtain an FPT algorithm. 


3.1 Parameterization by Memory and Leader 


We give an algorithm that solves LCR in time O*((L-(D+1))*?-D”), which means 
LCR(D,L) is FPT. We then show how to modify the algorithm to solve instances 
of LCR as they are likely to occur in practice. Interestingly, the modified version 
of the algorithm lends itself to an efficient implementation based on off-the-shelf 
sequential model checkers. We conclude with lower bounds for LCR(D, L). 


Upper Bound. We give an algorithm for the parameterization LCR(D, L). The 
key idea is to compactly represent computations that may be present in an 
instance of the given program. To this end, we introduce a domain of so-called 
witness candidates. The main technical result, Lemma 4, links computations and 
witness candidates. It shows that reachability of an unsafe state holds in an 
instance of the program if and only if there is a witness candidate that is valid 
(in a precise sense). With this, our algorithm iterates over all witness candi- 
dates and checks each of them for being valid. To state the overall result, let 
Wit(L,D) = (L-(D+1))*”- D? - L be the number of witness candidates and let 
Valid(L,D,C) = L? - D? - C? be the time it takes to check validity of a candidate. 
Note that it is polynomial. 


Theorem 1. LCR can be solved in time O( Wit(L, D) - Valid(L, D,C)). 


Let A = (D,a°,(Pr,Pc)) be the program of interest and Fy, be the set of 
unsafe states in the leader. Assume we are given a computation p showing that 
Pr, can reach a state in Fr when interacting with a number of contributors. We 
explain the main ideas to find an efficient representation for p that still allows 
for the reconstruction of a similar computation. To simplify the presentation, we 
assume the leader never writes (la) and immediately reads (?a) the same value. 
If this is the case, the read can be replaced by e. 

In a first step, we delete most of the moves in p that were carried out by 
contributors. We only keep first writes. For each value a, this is the write tran- 
sition fw(a) = c “> œ where a is written by a contributor for the first time. The 
reason we can omit subsequent writes of a is the following: If fw(a) is carried 
out by contributor Pı, we can assume that there is an arbitrary number of other 
contributors that all mimicked the behavior of P,. This means whenever Pı did 
a transition, they copycatted it right away. Hence, there are arbitrarily many 
contributors pending to write a. Phrased differently, the symbol a is available 
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for the leader whenever Pr needs to read it. The idea goes back to the Copycat 
Lemma stated in [16]. The reads of the contributors are omitted as well. We will 
make sure they can be served by the first writes and the moves done by Py. 
After the deletion, we are left with a shorter expression p’. We turn it into a 
word w over the alphabet QUD, UD with D, = DU{ 1} and D = {a| a € D}. 


! 2? 
Each transition c A, L c’ in p’ that is due to the leader moving from q to 


q’ is mapped (i) to q.a.q' if it is a write and (ii) to q.L.q’ otherwise. A first 
write fw(a) = c & œ of a contributor is mapped to @. We may assume that 
the resulting word w is of the form w = w,.w2 with wı € ((Qz.D1)*.D)* and 
we E€ (QL.D1)*.Fr. Note that w can still be of unbounded length. 

In order to find a witness of bounded length, we compress w, and wə to 
wi and w}. Between two first writes @ and b in w1, the leader can perform an 
unbounded number of transitions, represented by a word in (Qz.D1)*. Hence, 
there are states q € Qz repeating between a and b. We contract the word between 
the first and the last occurrence of q into just a single state q. This state now 
represents a loop on Pz. Since there are L states in the leader, this bounds the 
number of contractions. Furthermore, we know that the number of first writes is 
bounded by D, each symbol can be written for the first time at most once. Thus, 
the compressed string w{ is in the language ((Q,.D,)<*.D)~. 

The word we is of the form w2 = q.u for a state q E€ Qz and a word u. 
We truncate the word u and only keep the state g. Then we know that there is 
a computation leading from q to a state in Fr where Pr can potentially write 
any symbol but read only those symbols which occurred as a first write in w}. 
Altogether, we are left with a word of bounded length. 


Definition 2. The set of witness candidates is E = ((Qr.D,)<*.D)<".Q_. 


To characterize computations in terms of witness candidates, we define the 
notion of validity. This needs some notation. Consider a word w = wy ,...we 
over some alphabet I’. For i € [1..4], we set wļi] = w; and w[L..i] = wi... wi. If 
I’ CTI, we use w |r for the projection of w to the letters in I’. 

Consider a witness candidate w € € and let i € [1..|w|]. We use D(w, i) 
for the set of all first writes that occurred in w up to position i. Formally, 
D(w,i) = {a | @is a letter in w[1..i] | p}. We abbreviate D(w, |w|) as D(w). Let 
q E€ Qz and S C D. Recall that the state represents a loop in Pr. The set of all 


letters written within a loop from q to q when reading only symbols from S is 
v1 !aVve 


Loop(q, S) = {a | a € D and Jv, vo E (W(D)U R(S))* : q —— gq}. 
The definition of validity is given next. The three requirements are made 
precise in the text below. 


Definition 3. A witness candidate w € E is valid if it satisfies the following 
properties: (1) First writes are unique. (2) The word w encodes a run in Pr. (3) 
There are supportive computations on the contributors. 


(1) Ifwlp=G...&, then the č are pairwise different. 
la; i 
(2) Let wlo,up, = 1414242... aeqe+1.- If a; € D, then qi hr dai € Op isa 
write transition of Pr. If a; = L, then we have an ¢-transition qi =r qi+1- 
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Alternatively, there is a read qi weg qi+1 of a symbol a € D(w, pos(a;)) 
that already occurred within a first write (the leader does not read the own 
writes). Here, we use pos(a;) to access the position of a; in w. State q = q} 
is initial. There is a run from qe; to a state qf € Fy. During this run, 
reading is restricted to symbols that occurred as first writes in w. Formally, 
there is a v € (W(D) U R(D(w)))* such that qe41 Sr qz. 

(3) For each prefix vā of w with ā € D there is a computation q9 aS qon Po so 
that the reads in u can be obtained from v. Formally, let u’ = u | g(p). Then 
there is an embedding of u’ into v, a monotone map p: [1..|u’|] > [1..|v|] that 
satisfies the following. Let u’[i] = ?a with a € D. The read is served in one of 
the following three ways. We may have v|u(i)] = a, which corresponds to a 
write ofa by Py. Alternatively, v[u(i)] = q € Qz anda € Loop(q, D(w, p(i))). 
This amounts to reading from a leader’s write that was executed in a loop. 
Finally, we may have a € D(w, u(i)), corresponding to reading from another 
contributor. 


Lemma 4. There is at €N so that ® ye c with c € CY if and only if there 
is a valid witness candidate w € E. 


Our algorithm iterates over all witness candidates w € E and tests whether w 
is valid. The number of candidates Wit(L, D) is given by (L- (D + 1))"”-D?-L. This 
is due to the fact that we can force a witness candidate to have maximum length 
via inserting padding symbols. The number of candidates constitutes the first 
factor of the runtime stated in Theorem 1. The polynomial factor Valid(L,D,C) is 
due to the following Lemma. Details are given in the full version of the paper [9]. 


Lemma 5. Validity of w € E can be checked in time O(L3 - D? - C?). 


Practical Algorithm. We improve the above algorithm so that it should work 
well on practical instances. The idea is to factorize the leader along its strongly 
connected components (SCCs), the number of which is assumed to be small in real 
programs. Technically, our improved algorithm works with valid SCC-witnesses. 
They symbolically represent SCCs rather than loops in the leader. To state the 
complexity, we define the straight-line depth, the number of SCCs the leader may 
visit during a computation. The definition needs a graph construction. 

Let V C DÙ contain only words that do not repeat letters. Let r = @...é@ € V 
andi € [0..¢]. By Pr |; we denote the automaton obtained from Pr, by removing 
all transitions that read a value outside {c1,...,c;}. Let SCC(Pz |;) denote the 
set of all SCCs in this automaton. We construct the directed graph G(Pr,1) as 
follows. The vertices are the SCCs of all Pr |i, i € [0..4]. There is an edge between 
S,S’ € Scc(Pr li), if there are states q E€ S,q’ € S’ withg —> q' in Py li. If 
S € SCC(Pr |;-1) and S” € SCC(Pr L;i), we only get an edge if we can get from S' to 
S’ by reading c;. Note that the graph is acyclic. 

The depth d(r) of Py relative to r is the length of the longest path in G(Pr,r). 
The straight-line depth is d = max{d(r) | r € V}. The number of SCCs s is 
the size of SCC(Pr lo). With these values at hand, the number of SCC-witness 
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candidates (the definition of which can be found in the full version [9]) can be 
bounded by Witscc(s,D,d) < (s-(D+1))?-D?-2°'¢. The time needed to test 
whether a candidate is valid is Validgco(L, D, C, d) = L?-D-C?- d?. 


Theorem 6. LCR can be solved in time O( Witscc(s, D, d); Validscc(L, D, C, d)). 


For this algorithm, what matters is that the leader’s state space is strongly 
connected. The number of states has limited impact on the runtime. 


Lower Bound. We prove that the algorithm from Theorem 1 is only a root 
factor away from being optimal: A 2°(Y&-D-log(1-)) time algorithm for LCR would 
contradict ETH. We achieve the lower bound by a reduction from k x k Clique, 
the problem of finding a clique of size k in a graph the vertices of which are 
elements of a k x k matrix. Moreover, the clique has to contain one vertex from 
each row. Unless ETH fails, the problem cannot be solved in time 2°"!8(*)) [33]. 

Technically, we construct from an instance (G, k) of k x k Clique an instance 
(A = (D,a°, (Pr, Pc)), Fr) of LCR such that D = O(k) and L = O(k). Further- 
more, we show that G contains the desired clique of size k if and only if there 
is a t € N such that c° >*, c with c € CF. Suppose we had an algorithm for 
LCR running in time 2°(Y&-Dlog(L-D)). Combined with the reduction, this would 
yield an algorithm for k x k Clique with runtime 20(Vk?-log(k?)) — Qo(klogk) But 
unless ETH fails, such an algorithm cannot exist. 


Proposition 7. LCR cannot be solved in time Q°VEDlog(L-D)) unless ETH fails. 


We assume that the vertices V of G are given by tuples (i, j) with i, j € [1..A], 
where 7 denotes the row and j denotes the column. In the reduction, we need the 
leader and the contributors to communicate on the vertices of G. However, we 
cannot store tuples (i, j) in the memory as this would cause a quadratic blow-up 
D = O(k?). Instead, we communicate a vertex (i, j) as a string row(i). col(j). We 
distinguish between row and column symbols to avoid stuttering, the repeated 
reading of the same symbol. With this, it cannot happen that a thread reads a 
row symbol twice and takes it for a column. 

The program starts its computation with each contributor choosing a vertex 
(i, j) to store. For simplicity, we denote a contributor storing (i, j) by P(:,;). Note 
that there can be copies of P5). 

Since there are arbitrarily many contributors, the chosen vertices are only a 
superset of the clique we want to find. To cut away the false vertices, the leader 
Pr guesses for each row the vertex belonging to the clique. To this end, the 
program performs for each i € [1..k] the following steps: If (i, ji) is the vertex 
of interest, Py first writes row(i) to the memory. Each contributor that is still 
active reads the symbol and moves on for one state. Then Pr communicates the 
column by writing col(j;). Again, the active contributors Py j) read. 

A contributor can react to the read symbol in three different ways: (1) If 
i’ Æ i, the contributor Pi j») stores a vertex of a different row. The computation 
in P jn) can only go on if (2, j’) is connected to (i, jı) in G. Otherwise it will 
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stop. (2) If i’ = i and j’ = ji, then Py jn) stores exactly the vertex guessed by 
Pz. In this case, Pi jr) can continue its computation. (3) If i’ = i and j' ¥ j, 
thread Pi j) stores a different vertex from row 7. The contributor has to stop 
its computation. 

After k such rounds, there are only contributors left that store vertices 
guessed by Py. Furthermore, each two of these vertices are connected. Hence, 
they form a clique. To transmit this information to Pz, each Pi; j} writes #; to 
the memory, a special symbol for row i. After Pz, has read the string #1 ...#x, it 
moves to its final state. A formal construction can be found in the full version [9]. 


Absence of a Polynomial Kernel. A kernelization of a parameterized prob- 
lem is a compression algorithm. Given an instance, it returns an equivalent 
instance the size of which is bounded by a function only in the parameter. From 
an algorithmic perspective, kernels put a bound on the number of hard instances 
of the problem. Indeed, the search for small kernels is a key interest in algorith- 
mics, similar to the search for fast FPT algorithms. Even more, it can be shown 
that kernels exist if and only if a problem admits an FPT algorithm [11]. 

Let Q be a parameterized problem. A kernelization of Q is an algorithm 
that transforms, in polynomial time, a given instance (B,k) into an equivalent 
instance (B’, k’) such that |B’|+k’ < g(k), where g is a computable function. If 
g is a polynomial, we say that Q admits a polynomial kernel. 

Unfortunately, for many problems the community failed to come up with 
polynomial kernels. This lead to the contrary approach, namely disproving their 
existence [4,5,23]. Such a result constitutes an exponential lower bound on the 
number of hard instances. Like computational hardness results, such a bound 
is seen as an indication of general hardness of the problem. Technically, the 
existence of a polynomial kernel for the problem of interest is shown to imply 
NP C coNP/poly. But this inclusion is unlikely as it would cause a collapse of 
the polynomial hierarchy to the third level [36]. 

In order to link the occurrence of a polynomial kernel for LCR(D, L) with the 
above inclusion, we follow the framework developed in [5]. Let I’ be an alphabet. 
A polynomial equivalence relation is an equivalence relation R on I’* with the 
following properties: Given x,y € I, it can be decided in time polynomial in 
|x|+|y| whether (x,y) E€ R. Moreover, for each n there are at most polynomially 
many equivalence classes in R restricted to I'S". 

The key tool for proving kernel lower bounds are cross-compositions: Let 
L C I™* bea language and Q C I* x N be a parameterized language. We say 
that L cross-composes into Q if there exists a polynomial equivalence relation 
R and an algorithm C, the cross-composition, with the following properties: C 
takes as input 9,...,y7, E€ [™*, all equivalent under R. It computes in time 
polynomial in S~}_, |ye| a string (y, k) € T* x N such that (y, k) € Q if and only 
if there is an £ € [1..I] with ye € L. Furthermore, k < p(maxgeri..1 |ve| +log(Z)) 
for a polynomial p. 

It was shown in [5] that a cross-composition of any NP-hard language into 
a parameterized language Q prohibits the existence of a polynomial kernel for 
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Q unless NP C coNP/poly. In order to make use of this result, we show how to 
cross-compose 3-SAT into LCR(D,L). This yields the following: 


Theorem 8. LCR(D,L) does not admit a poly. kernel unless NP C coNP/poly. 


The difficulty of finding a cross-composition is in the restriction on the size 
of the parameters. This affects D and L: Both parameters are not allowed to 
depend polynomially on J, the number of given 3-SAT-instances. We resolve 
the polynomial dependence by encoding the choice of a 3-SAT-instance into the 
contributors via a binary tree. 


Proof (Idea). Assume some encoding of Boolean formulas as strings over a finite 
alphabet. We use the polynomial equivalence relation R defined as follows: Two 
strings y and w are equivalent under R if both encode 3-SAT-instances, and the 
numbers of clauses and variables coincide. On strings of bounded length, R has 
polynomially many equivalence classes. 

Let the given 3-SAT-instances be y1, .. . , yr. Every two of them are equivalent 
under R. This means that all ye have the same number of clauses m and use 
the same set of variables {1,...,@n}. We assume that pe = C A- A C$. 

We construct a program proceeding in three phases. First, it chooses an 
instance ye, then it guesses a valuation for all variables, and in the third phase 
it verifies that the valuation satisfies yz. While the second and the third phase 
do not cause a dependence of the parameters on J, the first phase does. It is not 
possible to guess a number £ € [1..I] and communicate it via the memory as this 
would provoke a polynomial dependence of D on I. 

To implement the first phase without a polynomial dependence, we transmit 
the indices of the 3-SAT-instances in binary. The leader guesses and writes tuples 
(ui, 1),..., (Uog) log(Z)) with ue € {0,1} to the memory. This amounts to 
choosing an instance ye with binary representation bin(¢) = uy... Wog(r)- 

It is the contributors’ task to store this choice. Each time, the leader writes 
a tuple (u;, i), the contributors read and branch either to the left, if u; = 0, or 
to the right, if u; = 1. Hence, in the first phase, the contributors are binary trees 
with I leaves, each leaf storing the index of an instance yg. Since we did not 
assume that I is a power of 2, there may be computations arriving at leaves that 
do not represent proper indices. In this case, the computation deadlocks. 

The size of D and Py in the first phase is O(log(I)). This satisfies the size- 
restrictions of a cross-composition. 

For guessing the valuation in the second phase, the system communicates on 
tuples (x;, v) with i € [1..n] and v € {0,1}. The leader guesses such a tuple for 
each variable and writes it to the memory. Any participating contributor is free 
to read one of the tuples. After reading, it stores the variable and the valuation. 

In the third phase, the satisfiability check is performed as follows: Each con- 
tributor that is still active has stored in its current state the chosen instance 
ye, a variable x;, and its valuation v;. Assume that x; when evaluated to vi 
satisfies c$ , the j-th clause of ye. Then the contributor loops in its current state 
while writing the symbol #;. The leader waits to read the string #1 ...#m. If 
Pr succeeds, we are sure that the m clauses of ye were satisfied by the chosen 
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valuation. Thus, ye is satisfiable and Pr moves to its final state. For details of 
the construction, we refer to the full version of the paper [9]. 


3.2 Parameterization by Contributors 


We show that the size of the contributors C has a wide influence on the complexity 
of LCR. We give an algorithm singly exponential in C, provide a matching lower 
bound, and prove the absence of a polynomial kernel. 


Upper Bound. Our algorithm is based on saturation. We keep the states reach- 
able by the contributors in a set and saturate it. This leads to a more compact 
representation of the program. Technically, we reduce LCR to a reachability 
problem on a finite automaton. The result is as follows. 


Proposition 9. LCR can be solved in time O(4°- L*- D? - C°). 


The main observation is that keeping one set of states for all contributors 
suffices to represent a computation. Let S C Qc be the set of states reachable 
by the contributors in a given computation. By the Copycat Lemma [16], we can 
assume for each q € S an arbitrary number of contributors that are currently 
in state q. This means that we do not have to distinguish between different 
contributor instances. 

Formally, we reduce the search space to Qz x D x P(Qc). Instead of storing 
explicit configurations, we store tuples (qz,a, S), where qz € Qz, a € D, and 
S C Qc. Between such tuples, the transition relation is as follows. Transitions 
of the leader change the state and the memory as expected. The contributors 
also change the memory but saturate S instead of changing the state. Formally, 
if there is a transition from q € S to q’, we add g’ to S. 


Lemma 10. There is at €N so that œ yc with c € C? if and only if there 
is a run from (qf, a°, {q2}) to a state in Fr x D x P(Qc). 


The dominant factor in the complexity estimation of Proposition9 is the 
time needed to construct the state space. It takes time O(4° - L4 - D3 - C?). For 
the definition and the proof of Lemma 10, we refer to the full version [9]. 


Lower Bound and Absence of a Polynomial Kernel. We present two lower 
bounds for LCR. The first is based on ETH: We show that there is no 2°°)-time 
algorithm for LCR unless ETH fails. This indicates that the above algorithm is 
asymptotically optimal. Technically, we give a reduction from n-variable 3-SAT 
to LCR such that the size of the contributor in the constructed instance is O(n). 
Then a 2°°)-time algorithm for LCR yields a 2°(”-time algorithm for 3-SAT, a 
contradiction to ETH. 

With a similar reduction, one can cross-compose 3-SAT into LCR(C). This 
shows that the problem does not admit a polynomial kernel. The precise con- 
structions and proofs can be found in the full version [9]. 
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Proposition 11 


(a) LCR cannot be solved in time 2° unless ETH fails. 
(b) LCR(C) does not admit a polynomial kernel unless NP C coNP/poly. 


4 Bounded-Stage Reachability 


The bounded-stage reachability problem is a simultaneous reachability problem. It 
asks whether all threads of a program can reach an unsafe state when restricted 
to s-stage computations. These are computations where the write permission 
changes s times. The problem was first analyzed in [1] and shown to be NP- 
complete for finite-state programs. We give matching upper and lower bounds in 
terms of fine-grained complexity and prove the absence of a polynomial kernel. 

Let A = (D,a°,(Pi)iefi.t]) be a program. A stage is a computation in A 
where only one of the threads writes. The remaining threads are restricted to 
reading the memory. An s-stage computation is a computation that can be split 
into s parts, each of which forming a stage. 


Bounded-Stage Reachability (BSR) 
Input: A program A = (D, a°, (P;)ief1..2}), a set Cf CC, and s EN. 
Question: Is there an s-stage computation c° +*, c for some c € Cf? 


We focus on a parameterization of BSR by P, the maximum number of states 
of a thread, and t, the number of threads. Let it be denoted by BSR(P,t). We 
prove that the parameterization is FPT and present a matching lower bound. The 
main result in this section is the absence of a polynomial kernel for BSR(P, t). 
The result is technically involved and reveals hardness of the problem. 

Parameterizations of BSR involving D and s, the number of stages, are not 
interesting for fine-grained complexity theory. We can show that BSR is NP-hard 
even for constant D and s. This immediately rules out FPT algorithms in these 
parameters. For details, we refer to the full version of the paper [9]. 


Upper Bound. We show that BSR(P,t) is fixed-parameter tractable. The idea 
is to reduce to reachability on a product automaton. The automaton stores the 
configurations, the current writer, and counts up to the number of stages s. To 
this end, it has O*(P*) many states. Details can be found in the full version [9]. 


Proposition 12. BSR can be solved in time O*(P?*). 
Lower Bound. By a reduction from k x k Clique, we show that a 2°(les(®))_ 
time algorithm for BSR would contradict ETH. The above algorithm is optimal. 


Proposition 13. BSR cannot be solved in time 2°*!8)) unless ETH fails. 


Fine-Grained Complexity of Safety Verification 33 


The reduction maps an instance of k x k Clique to an equivalent instance 
(A = (D,a? (PiJicp..t]), C7, 8) of BSR. Moreover, it keeps the parameters small. 
We have that P = O(k?) and t = O(k). As a consequence, a 2°(*!°8(?))_time 
algorithm for BSR would yield an algorithm for k x k Clique running in time 
go(k-log(k)) — ge(klog(k)) But this contradicts ETH. 


Proof (Idea). For the reduction, let V = [1..k] x [1..k] be the vertices of G. We 
define D = V U {a°} to be the domain of the memory. We want the threads to 
communicate on the vertices of G. For each row we introduce a reader thread 
P; that is responsible for storing a particular vertex of the row. We also add 
one writer, Pen, that is used to steer the communication between the P;. Our 
program A is given by (D,a®, ((P;)ieț..k] Pen))- 

Intuitively, the program proceeds in two phases. In the first phase, each P; 
non-deterministically chooses a vertex from the i-th row and stores it in its state 
space. This constitutes a clique candidate (1, 71),...,(k, jk) E V. In the second 
phase, thread P., starts to write a random vertex (1, jį) of the first row to the 
memory. The first thread P, reads (1,7) from the memory and verifies that the 
read vertex is actually the one from the clique candidate. The computation in 
P, will deadlock if 7; # jı. The threads P; with i 4 1 also read (1, j1) from the 
memory. They have to check whether there is an edge between the stored vertex 
(i, ji) and (1, j1). If this fails in some P;, the computation in that thread will also 
deadlock. After this procedure, the writer P., guesses a vertex (2, 75) and writes 
it to the memory. Now the verification steps repeat. After k repetitions of the 
procedure, we can ensure that the guessed clique candidate is indeed a clique. 
Note that the whole communication takes one stage. Details are given in [9]. 


Absence of a Polynomial Kernel. We show that BSR(P,t) does not admit 
a polynomial kernel. To this end, we cross-compose 3-SAT into BSR(P, t). 


Theorem 14. BSR(P, t) does not admit a poly. kernel unless NP C coNP/poly. 


In the present setting, coming up with a cross-composition is non-trivial. Both 
parameters, P and t, are not allowed to depend polynomially on the number I 
of given 3-SAT-instances. Hence, we cannot construct an NFA that distinguishes 
the IJ instances by branching into J different directions. This would cause a 
polynomial dependence of P on J. Furthermore, it is not possible to construct 
an NFA for each instance as this would cause such a dependence of t on I. To 
circumvent the problems, some deeper understanding of the model is needed. 


Proof (Idea). Let y1,..., 7 be given 3-SAT-instances, where each two are equiv- 
alent under R, the polynomial equivalence relation of Theorem 8. Then each ye 
has m clauses and n variables {71,...,2n}. We assume ye = Ci A- A O£. 

In the program that we construct, the communication is based on 4-tuples of 
the form (£, j, i, v). Intuitively, such a tuple transports the following information: 
The j-th clause in instance ye, c$, can be satisfied by variable x; with valuation 
v. Hence, our data domain is D = ({1..I] x [1..m] x [1..n] x {0,1}) U {a0}. 
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For choosing and storing a valuation of the x;, we introduce so-called variable 
threads P;,,..., Pen. In the beginning, each Pp; non-deterministically chooses a 
valuation for x; and stores it in its states. 

We further introduce a writer P,,. During a computation, this thread guesses 
exactly m tuples (41, 1,41, v1), ., (€m, M, îm, Um) in order to satisfy m clauses 
of potentially different instances. Each (£j, j, ij, vj) is written to the memory by 
Pw. All variable threads then start to read the tuple. If P,, with i Æ i; reads it, 
then the thread will just move one state further since the suggested tuple does 
not affect the variable x;. If P,, with 1 = i; reads the tuple, the thread will only 
continue its computation if vj coincides with the value that Py, guessed for x; 


and, moreover, x; with value v; satisfies clause ey : 

Now suppose the writer did exactly m steps while each variable thread did 
exactly m + 1 steps. This proves the satisfiability of m clauses by the chosen 
valuation. But these clauses can be part of different instances: It is not ensured 
that the clauses were chosen from one formula ye. The major difficulty of the 
cross-composition lies in how to ensure exactly this. 

We overcome the difficulty by introducing so-called bit checkers P,, where 
b € [1..log(Z)]. Each P, is responsible for the 6-th bit of bin(¢), the binary 
representation of £, where yy is the instance we want to satisfy. When Py writes 
a tuple (¢1,1,%1,v1) for the first time, each P, reads it and stores either 0 or 
1, according to the b-th bit of bin(¢,). After P, has written a second tuple 
(€2, 2, i2, v2), the bit checker P, tests whether the b-th bit of bin(¢,) and bin(é2) 
coincide, otherwise it will deadlock. This will be repeated any time P,, writes a 
new tuple to the memory. 

Assume, the computation does not deadlock in any of the P,. Then we can 
ensure that the b-th bit of bin(¢;) with j € [1..m] never changed during the 
computation. This means that bin(¢;) = --- = bin(¢,,). Hence, the writer Pw 
has chosen clauses of just one instance ye and with the current valuation, it is 
possible to satisfy the formula. Since the parameters are bounded, P € O(m) 
and t € O(n + log(Z)), the construction constitutes a proper cross-composition. 
For a formal construction and proof, we refer to the full version [9]. 


5 Conclusion 


We studied several parameterizations of LCR and BSR, two safety verification 
problems for shared-memory concurrent programs. For LCR, we identified the 
parameters D, L, and C. Our first algorithm showed that LCR(D,L) is FPT. Then, 
we used a modification of the algorithm to obtain a verification procedure valu- 
able for practical instances. The main insight was that due to a factorization 
along strongly connected components, the impact of L can be reduced to a poly- 
nomial factor in the time complexity. We also proved the absence of a polynomial 
kernel for LCR(D,L) and presented a lower bound which is a root factor away 
from the upper bound. For LCR(C) we gave a tight upper and lower bound. 
The parameters of interest for BSR are P and t. We have shown that BSR(P, t) 
is FPT and gave a matching lower bound. The main contribution was to prove 
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it unlikely that a polynomial kernel exists for BSR(P,t). The proof relies on a 
technically involved cross-composition that avoids a polynomial dependence of 
the parameters on the number of given 3-SAT-instances. 
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Abstract. Reconfigurable broadcast networks provide a convenient for- 
malism for modelling and reasoning about networks of mobile agents 
broadcasting messages to other agents following some (evolving) commu- 
nication topology. The parameterized verification of such models aims at 
checking whether a given property holds irrespective of the initial config- 
uration (number of agents, initial states and initial communication topol- 
ogy). We focus here on the synchronization property, asking whether all 
agents converge to a set of target states after some execution. This prob- 
lem is known to be decidable in polynomial time when no constraints 
are imposed on the evolution of the communication topology (while it is 
undecidable for static broadcast networks). 

In this paper we investigate how various constraints on reconfigura- 
tions affect the decidability and complexity of the synchronization prob- 
lem. In particular, we show that when bounding the number of reconfig- 
ured links between two communications steps by a constant, synchroniza- 
tion becomes undecidable; on the other hand, synchronization remains 
decidable in PTIME when the bound grows with the number of agents. 


1 Introduction 


There are numerous application domains for networks formed of an arbitrary 
number of anonymous agents executing the same code: prominent examples are 
distributed algorithms, communication protocols, cache-coherence protocols, and 
biological systems such as populations of cells or individuals, etc. The automated 
verification of such systems is challenging [3,8,12,15]: its aim is to validate at 
once all instances of the model, independently of the (parameterized) number of 
agents. Such a problem can be phrased in terms of infinite-state-system verifica- 
tion. Exploiting symmetries may lead to efficient algorithms for the verification 
of relevant properties [7]. 

Different means of interactions between agents can be considered in such 
networks, depending on the application domain. Typical examples are shared 
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variables [4,10, 13], rendez-vous [12], and broadcast communications [6,9]. In this 
paper, we target ad hoc networks [6], in which the agents can broadcast messages 
simultaneously to all their neighbours, i.e., to all the agents that are within their 
radio range. The number of agents and the communication topology are fixed 
once and for all at the beginning of the execution. Parameterized verification of 
broadcast networks checks if a specification is met independently of the number 
of agents and communication topology. It is usually simpler to reason about the 
dual problem of the existence of an initial configuration (consisting of a network 
size, an initial state for each agent, and a communication topology) from which 
some execution violates the given specification. 

Several types of specifications have been considered in the literature. We focus 
here on coverability and synchronization: does there exist an initial configuration 
from which some agent (resp. all agents at the same time) may reach a particular 
set of target states. Both problems are undecidable; decidability of coverability 
can be regained by bounding the length of simple paths in the communication 
topology [6]. 

In the case of mobile ad hoc networks (MANETs), agents are mobile, so that 
the communication links (and thus the neighbourhood of each agent) may evolve 
over time. To reflect the mobility of agents, Delzanno et al. studied reconfigurable 
broadcast networks [5,6]. In such networks, the communication topology can 
change arbitrarily at any time. Perhaps surprisingly, this modification not only 
allows for a more faithful modelling of MANETs, but it also leads to decidability 
of both the coverability and the synchronization problems [6]. A probabilistic 
extension of reconfigurable broadcast networks has been studied in [1,2] to model 
randomized protocols. 

A drawback of the semantics of reconfigurable broadcast networks is that they 
allow arbitrary changes at each reconfiguration. Such arbitrary reconfigurations 
may not be realistic, especially in settings where communications are frequent 
enough, and mobility is slow and not chaotic. In this paper, we limit the impact 
of reconfigurations in several ways, and study how those limitations affect the 
decidability and complexity of parameterized verification of synchronization. 

More specifically, we restrict reconfigurations by limiting the number of 
changes in the communication graph, either by considering global constraints (on 
the total number of edges being modified), or by considering local constraints 
(on the number of updates affecting each individual node). We prove that syn- 
chronization is decidable when imposing constant local constraints, as well as 
when imposing global constraints depending (as a divergent function) on the 
number of agents. On the other hand, imposing a constant global bound makes 
synchronization undecidable. We recover decidability by bounding the maximal 
degree of each node by 1. 


2 Broadcast Networks with Constrained Reconfiguration 


In this section, we first define reconfigurable broadcast networks; we then intro- 
duce several constraints on reconfigurations along executions, and investigate 
how they compare one to another and with unconstrained reconfigurations. 
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Fig. 2. Sample execution under reconfigurable semantics, synchronizing to {q4, q6, gs} 
(B-transitions are communications steps, R are reconfiguration steps.) 


2.1 Reconfigurable Broadcast Networks 


Definition 1. A broadcast protocol is a tuple P = (Q,I, X, A) where Q is a 
finite set of control states; I € Q is the set of initial control states; X is a finite 
alphabet; and A C (Q x {!!a, ??a| a E X} x Q) is the transition relation. 


A (reconfigurable) broadcast network is a system made of several copies of 
a single broadcast protocol P. Configurations of such a network are undirected 
graphs whose each node is labelled with a state of P. Transitions between con- 
figurations can either be reconfigurations of the communication topology (i.e., 
changes in the edges of the graph), or a communication via broadcast of a mes- 
sage (i.e., changes in the labelling of the graph). Figures1 and 2 respectively 
display an example of a broadcast protocol and of an execution of a network 
made of three copies of that protocol. 

Formally, we first define undirected labelled graphs. Given a set £ of labels, 
an £-graph is an undirected graph G = (N, E, L) where N is a finite set of nodes; 
E C P(N)! (notice in particular that such a graph has no self-loops); finally, 
L: N — £ is the labelling function. We let Ge denote the (infinite) set of £- 
labelled graphs. Given a graph G € Gc, we write n ~ n’ whenever {n,n’} € E 
and we let Neighg(n) = {n’ | n ~ n’} be the neighbourhood of n, i.e. the set of 
nodes adjacent to n. For a label £, we denote by |G|¢ the number of nodes in G 
labelled by £. Finally L(G) denotes the set of labels appearing in nodes of G. 

The semantics of a reconfigurable broadcast network based on broadcast pro- 
tocol P is an infinite-state transition system T(P). The configurations of T(P) 
are Q-labelled graphs. Intuitively, each node of such a graph runs protocol P, 


1 For a finite set S and 1 < k < |S|, we let P(S) = {T C S | |T| = k}. 
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and may send/receive messages to/from its neighbours. A configuration (N, E, L) 
is said initial if L(N) C J. From a configuration G = (N, E, L), two types of steps 
are possible. More precisely, there is a step from (N, E, L) to (N’, E’, L’) if one of 
the following two conditions holds: 


(reconfiguration step) N’ = N and L’ = L: a reconfiguration step does not 
change the set of nodes and their labels, but may change the edges arbitrarily; 
(communication step) N’ = N, E’ = E, and there exists n € N anda € X 
such that (L(n),!!a,L’(n)) € A, and for every n’, if n’ € Neigh¢(n), then 
(L(n’), ??a, L’(n’)) € A, otherwise L’(n’) = L(n’): a communication step reflects 
how nodes evolve when one of them broadcasts a message to its neighbours. 


An execution of the reconfigurable broadcast network is a sequence p = 
(Gi)o<i<r of configurations such that for any i < r, there is a step from G; 
to G;,, and p strictly alternates communication and reconfiguration steps (the 
latter possibly being trivial). An execution is initial if it starts from an initial 
configuration. 

An important ingredient that we heavily use in the sequel is juxtaposition 
of configurations and shuffling of executions. The juxtaposition of two con- 
figurations G = (N,E,L) and G’ = (N’,E’,L’) is the configuration G6 G’ = 
(N È N’, E W E’, Lẹ), in which Lg extends both L and L’: Lẹ(n) = L(n) ifn € N 
and Lẹ(n) = L’(n) if n € N’. We write G? for the juxtaposition of G with itself, 
and, inductively, GY for the juxtaposition of GY~! with G. A shuffle of two 
executions p = (G;)o<i<r and p! = (Gi )o<j<r is an execution pẹ from Go © GG 
to G; ® G’, obtained by interleaving p and p’. Note that a reconfiguration step 
in pẹ may be composed of reconfigurations from both p and p’. We write p © p’ 
for the set of shuffle executions obtained from p and p’. 

Natural decision problems for reconfigurable broadcast networks include 
checking whether some node may reach a target state, or whether all nodes 
may synchronize to a set of target states. More precisely, given a broadcast 
protocol P and a subset F C Q, the coverability problem asks whether there 
exists an initial execution p that visits a configuration G with L(G)N F # 9, 
and the synchronization problem asks whether there exists an initial execution p 
that visits a configuration G with L(G) C F. For unconstrained reconfigurations, 
we have: 


Theorem 2 (([5,6,11]). The coverability and synchronization problems are 
decidable in PTIME for reconfigurable broadcast protocols. 


Remark 1. The synchronization problem was proven decidable in [6], and PTIME 
membership was given in [11, p. 41]. The algorithm consists in computing the 
set of states of P that are both reachable (i.e., coverable) from an initial con- 
figuration and co-reachable from a target configuration. This can be performed 
by applying iteratively the algorithm of [5] for computing the set of reachable 
states (with reversed transitions for computing co-reachable states). 
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Example 1. Consider the broadcast protocol of Fig. 1 with I = {qo}. From each 
state, unspecified message receptions lead to an (omitted) sink state; this way, 
each broadcast message triggers a transition in all the neighbouring copies. 

For that broadcast protocol, one easily sees that it is possible to synchronize 
to the set {q4, q6, qs}. Moreover, three copies are needed and sufficient for that 
objective, as witnessed by the execution of Fig. 2. The initial configuration has 
three copies and two edges. If the central node broadcasts a, the other two 
nodes receive, one proceeding to qs and the other to q7. Then, we assume the 
communication topology is emptied before the same node broadcasts b, moving 
to q2. Finally the node in q5 connects to the one in q2 to communicate on c 
and then disconnects, followed by a similar communication on d initiated by the 
node in q7. 


2.2 Natural Constraints for Reconfiguration 


Allowing arbitrary changes in the network topology may look unrealistic. In order 
to address this issue, we introduce several ways of bounding the number of recon- 
figurations after each communication step. For this, we consider the following 
natural pseudometric between graphs, which for simplicity we call distance. 


Definition 3. Let G = (N,E,L) and G’ = (N’,E’,L’) be two L-labelled graphs. 
The distance between G and G’ is defined as 


dist(G, G’) = |E U BE’ \ (EN E’) 
when N = N’ and L =U’, and dist(G, G’) = 0 otherwise. 


Setting the “distance” to 0 for two graphs that do not agree on the set of nodes 
or on the labelling function might seem strange at first. This choice is motivated 
by the definition of constraints on executions (see below) and of the number of 
reconfigurations along an execution (see Sect.2.3). Other distances may be of 
interest in this context; in particular, for a fixed node n € N, we let dist,(G, G’) 
be the number of edges involving node n in the symmetric difference of E and E’ 
(still assuming N = N’ and L = L’). 


Constant Number of Reconfigurations per Step. A first natural constraint on 
reconfiguration consists in bounding the number of changes in a reconfiguration 
step by a constant number. Recall that along executions, communication and 
reconfiguration steps strictly alternate. 


Definition 4. Let k € N. An execution p = (Gi)o<i<r of a reconfigurable broad- 
cast network is k-constrained if for every indexi < r, it holds dist(G;, Gi+1) < k. 


Example 1 (Contd). For the synchronization problem, bounding the number of 
reconfigurations makes a difference. The sample execution from Fig. 2 is not 1- 
constrained, and actually no 1-constrained executions of that broadcast protocol 
can synchronize to {q4,q5, qe}. This can be shown by exhibiting and proving an 
invariant on the reachable configurations (see Lemma 10). 
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Beyond Constant Number of Reconfigurations per Step. Bounding the number of 
reconfigurations per step by a constant is somewhat restrictive, especially when 
this constant does not depend on the size of the network. We introduce other 
kinds of constraints here, for instance by bounding the number of reconfigura- 
tions by k on average along the execution, or by having a bound that depends 
on the number of nodes executing the protocol. 

For a finite execution p = (G;)o<i<r of a reconfigurable broadcast network, 
we write nb_comm(p) for the number of communication steps along p (notice that 
|r/2] < nb_comm(p) < [r/2] since we require strict alternation between recon- 
figuration and communication steps), and nb_reconf(p) for the total number of 
edge reconfigurations in p, that is nb_reconf(p) = ys, dist(G;, Gi+1). 


Definition 5. Let k € N. An execution p of a reconfigurable broadcast network 
is said k-balanced if it starts and ends with a communication step, and satisfies 
nb_reconf(p) < k- (nb_comm(p) — 1). 


This indeed captures our intuition that along a k-balanced execution, reconfig- 
urations on average update less than k links. 

Finally, we will also consider two relevant ways to constrain reconfigurations 
depending on the size of the network: first locally, bounding the number of 
reconfigurations per node by a constant; second globally, bounding the total 
number of reconfigurations by a function of the number of nodes. 

We first bound reconfigurations locally. 


Definition 6. Let k € N. An execution p = (Gi)o<i<r of a reconfigurable broad- 
cast network is k-locally-constrained, if, for every node n and for every index 
L<r, dist,(G;, G41) <k. 


One may also bound the number of reconfigurations globally using bounding 
functions, that depend on the number of nodes in the network: 


Definition 7. Let f: N — N be a function. An execution p = (Gi)o<i<r of a 
reconfigurable broadcast network is f-constrained, if, writing n for the number 
of nodes in Go, it holds dist(G;, Gi41) < f(n) for anyi <r. 


Notice that if f is the constant function n € N+ k for some k € N, f-constrained 
executions coincide with k-constrained ones, so that our terminology is non- 
ambiguous. Other natural bounding functions are non-decreasing and diverging. 
This way, the number of possible reconfigurations tends to infinity when the 
network size grows, i.e. Vn. Ik. f(k) > n. 


Remark 2. Coverability under constrained reconfigurations is easily observed to 
be equivalent to coverability with unconstrained reconfigurations: from an uncon- 
strained execution, we can simply juxtapose extra copies of the protocol, which 
would perform extra communication steps so as to satisfy the constraint. When 
dealing with synchronization, this technique does not work since the extra copies 
would also have to synchronize to a target state. As a consequence, we only focus 
on synchronization in the rest of this paper. 
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2.3 Classification of Constraints 


In this section, we compare our restrictions. We prove that, for the synchro- 
nization problem, k-locally-constrained and f-constrained reconfigurations, for 
diverging functions f, are equivalent to unconstrained reconfigurations. On the 
other hand, we prove that k-constrained reconfigurations are equivalent to k- 
balanced reconfigurations, and do not coincide with unconstrained reconfigura- 
tions. 


Equivalence Between Unconstrained and Locally-Constrained Reconfigurations. 


Lemma 8. Let P be a broadcast protocol, F C Q be a target set, and f be a 
non-decreasing diverging function. If the reconfigurable broadcast network defined 
by P has an initial execution synchronizing in F, then it has an f-constrained 
initial execution synchronizing in F. 


Proof. We first prove the lemma for the identity function Id. More precisely, 
we prove that for an execution p = (G;)o<i<n, of the reconfigurable broadcast 
network, there exists a Id-constrained execution p° = (Gi)o<j<m, whose last 
transition (if any) is a communication step, and such that for any control state q, 
IGnlq = |Gi,|¢- We reason by induction on the length of the execution. The claim 
is obvious for n = 0. Suppose the property is true for all naturals less than or 
equal to some n € N, and consider an execution p = (G;)o<i<n41- The induction 
hypothesis ensures that there is an f-constrained execution p' = (Gj)o<j<m 
with |G,|qg = |Gi,,|, for all q. If the last transition from G, to G,41 in p is a 
reconfiguration step, then the execution p’ witnesses our claim. Otherwise, the 
transition from Gn to G,41 is a communication step, involving a broadcasting 
node n of G,, labelled with q, and receiving nodes nı to n, of Gn, respectively 
labelled with qı to qr. By hypothesis, Gi, also contains a node n’ labelled with q 
and r nodes nj to n/., labelled with qı to qr. We then add two steps after G’, in p’: 
we first reconfigure the graph so that Neigh, (n’) = {ni |O <i <r}, which 
requires changing at most |Go| — 1 links, and then perform the same broadcast / 
receive transitions as between Gn and Gn+1- 

For the general case of the lemma, suppose f is a non-decreasing diverging 
function. Further, let o = (G;)o<i<n be an Id-constrained execution, and pick k 
such that f(k -|Go|) > |Go|. Consider the initial configuration G, made of k 
copies of Gg, and the execution, denoted p*, made of k copies of p running 
independently from each of the k copies of Go in Gf. Each reconfiguration step 
involves at most |Go| links, so that p! is f-constrained. 


Lemma 9. Let P be a broadcast protocol with F C Q a target set. If the recon- 
figurable broadcast network defined by P has an initial execution synchronizing 
in F, then it has a 1-locally-constrained initial execution synchronizing in F. 


k-Constrained and k-Balanced Reconfigurations. We prove here that k- 
constrained and k-balanced reconfigurations are equivalent w.r.t. synchroniza- 
tion, and that they are strictly stronger than our other restrictions. We begin 
with the latter: 
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Lemma 10. There exists a broadcast protocol P and a set F C Q of target 
states for which synchronization is possible from some initial configuration when 
unconstrained reconfigurations are allowed, and impossible, from every initial 
configuration when only 1-constrained reconfigurations are allowed. 


A protocol with this property is the one from Example 1, for which we 
exhibited a 2-constrained synchronizing execution. It can be proved that no 1- 
constrained synchronizing executions exist for this protocol, whatever the num- 
ber of copies. We now prove the main result of this section: 


Theorem 11. Let P be a broadcast protocol and F C Q. There exists a k- 
constrained initial execution synchronizing in F if, and only if, there exists a 
k-balanced initial execution synchronizing in F. 


Proof. The left-to-right implication is simple: if there is a k-constrained initial 
execution synchronizing in F, w.l.o.g. we can assume that this execution starts 
and ends with a communication step; moreover, each reconfiguration step con- 
tains at most k edge reconfigurations, so that the witness execution is k-balanced. 

Let p = (Gi)o<i<n be a k-balanced execution synchronizing in F and starting 
and ending with communication steps (hence n is odd). We define the poten- 
tial (p:)o<i<n of p as the sequence of n + 1 integers obtained as follows: 


~ po = 0; 
— pois = pa + k for i < (n—1)/2 (this corresponds to a communication step); 
— p42 = P2i+1 — dist(Gai41, G2i+2) for i < (n—1)/2—1 (reconfiguration step). 


That p is k-balanced translates as p,_1 > 0: the sequence (p;)o<i<n stores the 
value of k - nb_comm(p<;) — nb_reconf(p<;) for each prefix p<; of p; being k- 
balanced means that pn > k, and since the last step is a communication step, 
this in turn means p,_; > 0. On the other hand, in order to be k-constrained, 
it is necessary (but not sufficient) to have p; > 0 for allO <i<n. 

We build a k-constrained execution by shuffling several copies of p. We actu- 
ally begin with the case where k = 1, and then extend the proof to any k. We first 
compute how many copies we need. For this, we split p into several phases, based 
on the potential (p;)o<i<n defined above. A phase is a maximal segment of p<n—1 
(the prefix of p obtained by dropping the last (communication) step) along which 
the sign of the potential is constant (or zero): graphs G; and G; are in the same 
phase if, and only if, for alli <1 < l’ < j, it holds pı- py > 0. We decompose p as 
the concatenation of phases (pj)o<j<m; since p is k-balanced, m is even, and po, 
Pm, and all even-numbered phases are non-negative phases (i.e., the potential 
is non-negative along those executions), while all odd-numbered executions are 
non-positive phases. Also, all phases end with potential zero, except possibly 
for pm. See Fig. 3 for an example of a decomposition into phases. 


Lemma 12. For any phase pi = Gy,---Ge, of a 1-balanced execution p = 
Go-:-G,, there exists k; < (e; — b;)/2 such that for any N €E N, there exists 
a 1-constrained execution from Gj' © GW to GY @ GY. 
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Proof. We handle non-negative and non-positive phases separately. In a non- 
negative phase, we name repeated reconfiguration step any reconfiguration step 
that immediately follows another (possibly from the previous phase) reconfigu- 
ration step (so that if there are four consecutive reconfiguration steps, the last 
three are said repeated); similarly, we name repeated communication step any 
communication step that is immediately followed (possibly in the next phase) 
by another communication step (hence the first three of fours consecutive com- 
munication steps are repeated). 

We first claim that any non-negative phase contains at least as many repeated 
communication steps as it contains repeated reconfiguration steps. Indeed, any 
non-repeated communication step in a non-negative phase is necessarily followed 
by a non-repeated reconfiguration step, and conversely, and non-negative phases 
have at least as many communication steps as they have reconfiguration steps. 

As a consequence, we can number all repeated reconfiguration steps from 1 
(earliest) to «; (latest), for some «;, and similarly for repeated communication 
steps. Clearly enough, in a non-negative phase, for any 1 < j < ri, the repeated 
communication step numbered j occurs before the repeated reconfiguration step 
carrying the same number. 

We now build our 1-constrained execution from Gj’ ® Gr to Gt @ GF. 
We begin with a first part, where only the components starting from Gp, move: 


— the first copy starting in Gp, follows the execution p; until reaching the 
repeated reconfiguration step number 1. That reconfiguration step cannot be 
performed immediately as it follows another reconfiguration step. Notice that 
during this stage, this copy has taken at least one repeated communication 
step, numbered 1; 

— the second copy then follows p; until reaching its first repeated communication 
step (which must occur before the first repeated reconfiguration step). It takes 
this communication step, then allowing the first copy to perform its first 
repeated reconfiguration step; 

— this simulation continues, each time having the l + 1-st copy of the system 
taking its j-th repeated communication step in order to allow the l-th copy 
to perform its j-th repeated reconfiguration step. Non-repeated steps can 
always be performed individually by each single copy. Also, the first copy 
may always take repeated communication steps not having a corresponding 
reconfiguration step, as in the first stage of this part. 


Notice that the number of copies involved in this process is arbitrary. The process 
lasts as long as some copies may advance within phase p;. Hence, when the 
process stops, all copies of the original system either have reached the end of p;, 
or are stopped before a repeated reconfiguration step. For the copies in the latter 
situation, we use the copies starting from Go. It remains to prove that having x; 
such copies is enough to make all processes reach the end of p;. 

For this, we first assume that the potential associated with p; ends with 
value zero. This must be the case of all phases except the last one, which we 
handle after the general case. We first notice that in the execution we are cur- 
rently building, any repeated communication step performed by any (but the 
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Fig. 3. Phases of a 1-balanced execution, and correspondence between repeated 
communication steps (loosely dotted blue steps) and repeated reconfiguration steps 
(densely dotted red steps) (Color figure online) 


very first) copy that started from Gy, is always followed by a repeated reconfigu- 
ration step. Similarly, non-repeated communication steps of any copy is followed 
by a non-repeated broadcast step of the same copy. As a consequence, the poten- 
tial associated with the global execution we are currently building never exceeds 
the total number of repeated communication steps of performed by the first 
copy; hence it is bounded by «;, whatever the number N of copies involved. As a 
consequence, at most «; communication steps are sufficient in order to advance 
all copies that started from G+, to the end of p;. 

Finally, the case of the last phase pm (possibly ending with positive potential) 
is easily handled, since it has more communication steps than reconfiguration 
steps. 

The proof for non-positive phases is similar. 


Pick a 1-balanced execution p = Go---G,, and decompose it into phases 
P1-+*Pm- For each phase p;, we write k; for the total number of repeated recon- 
figuration steps, and we let K = )°,<;<,, Ki for the total number of repeated 
reconfiguration steps along p. Notice that «x < n/2. 


Lemma 13. For every 1-balanced execution p = Go--- Gn, and for every N EN, 
there exists a 1-constrained execution from G] @ GEN to GYTSN, 


Combining the above two lemmas, we obtain the following proposition, which 
refines the statement of the Theorem 11: 


Proposition 14. For every 1-balanced execution p = Gg---Gy and every N > 
k? +k, there exists a 1-constrained execution from GẸ to GN. 


We finally extend this result to k > 1. In this case, splitting p into phases 
is not as convenient as when k = 1: indeed, a non-positive phase might not end 
with potential zero (because communication steps make the potential jump by 
k units). Lemma 12 would not hold in this case. 

We circumvent this problem by first shuffling k copies of p in such a way 
that reconfigurations can be gathered into groups of size exactly k. This way, 
we can indeed split the resulting execution into non-negative and non-positive 
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phases, always considering reconfigurations of size exactly k; we can then apply 
the techniques above in order to build a synchronizing k-constrained execution. 
This completes our proof. 


3 Parameterized Synchronization Under Reconfiguration 
Constraints 


3.1 Undecidability for k-Constrained Reconfiguration 


Although synchronization is decidable in PTIME [6,11] for reconfigurable broad- 
cast networks, the problem becomes undecidable when reconfigurations are k- 
constrained. 


Theorem 15. The synchronization problem is undecidable for reconfigurable 
broadcast networks under k-constrained reconfigurations. 


Proof. We prove this undecidability result for 1-constrained reconfigurations, by 
giving a reduction from the halting problem for Minsky machines [14]. We begin 
with some intuition. The state space of our protocol has two types of states: 


— control states encode the control state of the 2-counter machine; 

— counter states are used to model counter values: for each counter cj € {c1,c2}, 
we have a state zero; and a state one;. The value of counter c; in the simula- 
tion will be encoded as the number of edges in the communication topology 
between the control node and counter nodes in state one;; moreover, we will 
require that control nodes have no communication links with counter nodes 
in state zero;. 

Incrementations and decrementations can then be performed by creating a 
link with a node in zero; and sending this node to onej, or sending a one;-node 
to zero; and removing the link. 


In order to implement this, we have to take care of the facts that we may have 
several control nodes in our network, that we may have links between two control 
nodes or between two counter nodes, or that links between control nodes and 
counter nodes may appear or disappear at random. Intuitively, those problems 
will be handled as follows: 


— we cannot avoid having several control nodes; instead, given a synchronizing 
execution of the broadcast protocol, we will select one control node and show 
that it encodes a correct execution of the 2-counter machine; 

— in order to reach a synchronizing configuration, the selected control node 
will have to perform at least as many reconfiguration steps as broadcast 
steps. Because we consider 1-constrained runs, it will perform exactly the 
same number of reconfiguration steps as broadcast steps, so that no useless / 
unexpected reconfigurations may take place during the simulation; 
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Fig. 5. Modules for simulating incrementation and decrementation/zero test 


Ni-okj Ndok; 
To Ni-ack 3 Ri-okj Geo . Nd-ack; C= 
Pi-askj ( | ?? d-askj 


Pi-init, ?i-askx, ??i-ok«, ??t-exit, ??d-exit 


Fig. 6. The part of the protocol for counter nodes 


Fig. 7. Parts of the protocol for auxiliary nodes 
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— control nodes will periodically run special broadcasts that would send any 
connected nodes (except nodes in state one;) to a sink state, thus preventing 
synchronization. This way, we ensure that particular control node is clean. 
Initially, we require that control nodes have no connections at all. 


We now present the detailed construction, depicted at Figs.4, 5, 6 and 7. 
Each state of the protocol is actually able to synchronize with all the messages. 
Some transitions are not represented on the figures, to preserve readability: all 
nodes with no outgoing transitions (i.e., state Dna corresponding to the halting 
state, as well as states zero,’ and done;) actually carry a self-loop synchronizing 
on all messages; all other omitted transitions lead to a sink state, which is not 
part of the target set. 

Let us explain the intended behaviour of the incrementation module of Fig. 5: 
when entering the module, our control node n in state L is linked to cı counter 
nodes in state one, and to c2 counter nodes in state onez; it has no other links. 
Moreover, all auxiliary nodes are either in state free; or in state done;. Run- 
ning through the incrementation module from L will use one counter node m in 
state zero; (which is used to effectively encode the increase of counter c;) and 
four auxiliary nodes a; (initially in state free,), a2 (in state freeg), and ag and a 
(in state frees). 

The execution then runs as follows: 


— a link is created between the control node n and the first auxiliary node ay, 
followed by a message exchange !!i-init; 

— a link is created between n and m, and node a, broadcasts !!fr1; 

— a link is created between n and ag, and n broadcasts !!¢-ask;, which is received 
by both ag and m; 

— a link is created between n and ag; node m sends its acknowledgement !!i-ack; 
to n; 

— a link is created between n and a; node n sends !!i-ok;, received by m, a2, a3 
and a3; 

— the link between n and a, is removed, and ag sends !! fro; 

— the link between n and az is removed, and a3 sends !! frs3; 

— the link between n and ag is removed, and a% sends !!fra; 

— finally, the link between n and a% is removed, and n sends !!-exit. 


After this sequence of steps, node n has an extra link to a counter node in 
state onej, which indeed corresponds to incrementing counter cj. Moreover, no 
nodes have been left in an intermediary state. A similar analysis can be done for 
the second module, which implements the zero-test and decrementation. This 
way, we can prove that if the two-counter machine has a halting computation, 
then there is an initial configuration of our broadcast protocol from which there 
is an execution synchronizing in the set F formed of the halting control state 
and states onej, zero,’ and done;. 

It now remains to prove the other direction. More precisely, we prove that 
from a l-constrained synchronizing execution of the protocol, we can extract a 
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synchronizing execution in some normal form, from which we derive a halting 
execution of the two-counter machine. 

Fix a 1-constrained synchronizing execution of the broadcast network. First 
notice that when a control node n reaches some state L (the first node of an incre- 
mentation or decrementation module), it may only be linked to counter nodes in 
state one;: this is because states L can only be reached by sending !!i-exit, !!d-exit, 
'!t-exit, or !!start. The former two cases may only synchronize with counter nodes 
in state one;; in the other two cases, node n may be linked to no other node. 
Hence, for a control node n to traverse an incrementation module, it must get 
links to four auxiliary nodes (in order to receive the four fr messages), those four 
links must be removed (to avoid reaching the sink state), and an extra link has 
to be created in order to receive message i-ack;. In total, traversing an incremen- 
tation module takes nine communication steps and at least nine reconfiguration 
steps. Similarly, traversing a decrementation module via any of the two branches 
takes at least as many reconfiguration steps as communication steps. In the end, 
taking into account the initial !!start communication step, if a control node n 
is involved in Ba communication steps, it must be involved in at least B, — 1 
reconfiguration steps. 

Assume that every control node n is involved in at least B, reconfiguration 
steps: then we would have at least as many reconfiguration steps as communica- 
tion steps, which in a 1-constrained execution is impossible. Hence there must 
be a control node no performing Bno communication steps and exactly Bno — 1 
reconfiguration steps. As a consequence, when traversing an incrementation mod- 
ule, node no indeed gets connected to exactly one new counter node, which 
indeed must be in state one; when no reaches the first state of the next mod- 
ule. Similarly, traversing a decrementation/zero-test module indeed performs the 
expected changes. It follows that the sequence of steps involving node no encodes 
a halting execution of the two-counter machines. 


The 1-constrained executions in the proof of Theorem 15 have the additional 
property that all graphs describing configurations are 2-bounded-path configu- 
rations. For K € N a configuration G is a K-bounded-path configuration if the 
length of all simple paths in G is bounded by K. Note that a constant bound on 
the length of simple paths implies that the diameter (i.e. the length of the longest 
shortest path between any pair of vertices) is itself bounded. The synchronization 
problem was proved to be undecidable for broadcast networks without reconfig- 
uration when restricting to K-bounded-path configurations [6]. In comparison, 
for reconfigurable broadcast networks under k-constrained reconfigurations, the 
undecidability result stated in Theorem 15 can be strengthened into: 


Corollary 16. The synchronization problem is undecidable for reconfigurable 
broadcast networks under k-constrained reconfigurations when restricted either 
to bounded-path configurations, or to bounded-diameter configurations. 
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3.2 Decidability Results 


f-Constrained and k-Locally-Constrained Reconfigurations. From the equiva- 
lence (w.r.t. synchronization) of k-locally-constrained, f-constrained and uncon- 
strained executions (Lemmas 8 and 9), and thanks to Theorem 2, we immediately 
get: 


Corollary 17. Let k € N and f: N — N be a non-decreasing diverging func- 
tion. The synchronization problem for reconfigurable broadcast networks under k- 
locally-constrained (resp. f-constrained) reconfigurations is decidable in PTIME. 


Bounded Degree Topology. We now return to k-constrained reconfigurations, and 
explore restrictions that allow one to recover decidability of the synchronization 
problem. We further restrict k-constrained reconfigurations by requiring that the 
degree of nodes remains bounded, by 1; in other terms, communications corre- 
spond to rendez-vous between the broadcasting node and its single neighbour. 


Theorem 18. The synchronization problem is decidable for reconfigurable 
broadcast networks under k-constrained reconfiguration when restricted to 1- 
bounded-degree topologies. 


Sketch of Proof. The proof consists in transforming the synchronization problem 
above into a reachability problem for some Petri net. The Petri net has two kinds 
of places (plus a few auxiliary places): one place for each state of the protocol, 
representing isolated nodes (i.e., nodes having no neighbours), and one place for 
each pair of states of the protocol, representing pairs of connected nodes. Since 
we restrict to degree-1 topologies, any node of the network is in one of those 
two configurations. Places representing isolated nodes are simply called isolated 
places in the sequel, while places corresponding to pairs of connects nodes are 
called connected places. 

An initialization phase stores tokens in the places described above, so as to 
represent the initial configuration. In a second phase, the Petri net simulates 
an execution of the reconfigurable broadcast network: communication steps and 
(k-constrained) reconfiguration steps are easily encoded as transitions of this 
Petri net: communication steps correspond to moving tokens from one place to 
the place obtained by updating the states as prescribed by the transitions of the 
broadcast protocol. Atomic reconfigurations may create or remove links, either 
consuming two tokens in isolated places and adding a token in the corresponding 
connected place, or the other way around. We use k auxiliary places in order to 
count the number of atomic reconfigurations, in order to enforce the k-constraint. 

Finally, the Petri net may enter a terminal phase, where it checks synchro- 
nization by absorbing all tokens that lie in (isolated or connected) places corre- 
sponding to target states. In the end, the simulated execution has been synchro- 
nizing if, and only if, no tokens remain in any of the main states. 
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4 Conclusion 


Restricting reconfigurations in reconfigurable broadcast networks is natural to 
better reflect mobility when communications are frequent enough and the move- 
ment of nodes is not chaotic. In this paper, we studied how constraints on the 
number of reconfigurations (at each step and for each node, at each step and 
globally, or along an execution) change the semantics of networks, in partic- 
ular with respect to the synchronization problem, and affect its decidability. 
Our main results are the equivalence of k-constrained and k-balanced semantics, 
the undecidability of synchronization under k-constrained reconfigurations, and 
its decidability when restricting to 1-bounded-degree topologies. 

As future work, we propose to investigate, beyond the coverability and syn- 
chronization problems, richer objectives such as cardinality reachability prob- 
lems as in [5]. Moreover, for semantics with constrained reconfigurations that 
are equivalent to the unconstrained one as far as the coverability and synchro- 
nization problems are concerned, it would be worth studying the impact of the 
reconfiguration restrictions (e.g. k-locally-constrained or f-constrained) on the 
minimum number of nodes for which a synchronizing execution exists, and on 
the minimum number of steps to synchronize. 
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Abstract. Nearly all web-based interfaces are written in JavaScript. 
Given its prevalence, the support for high performance JavaScript code 
is crucial. The ECMA Technical Committee 39 (TC39) has recently 
extended the ECMAScript language (i.e., JavaScript) to support shared 
memory accesses between different threads. The extension is given in 
terms of a natural language memory model specification. In this paper 
we describe a formal approach for validating both the memory model and 
its implementations in various JavaScript engines. We first introduce a 
formal version of the memory model and report results on checking the 
model for consistency and other properties. We then introduce our tool, 
EMME, built on top of the Alloy analyzer, which leverages the model 
to generate all possible valid executions of a given JavaScript program. 
Finally, we report results using EMME together with small test programs 
to analyze industrial JavaScript engines. We show that EMME can find 
bugs as well as missed opportunities for optimization. 


1 Introduction 


As web-based applications written in JavaScript continue to increase in com- 
plexity, there is a corresponding need for these applications to interact effi- 
ciently with modern hardware architectures. Over the last decade, processor 
architectures have moved from single-core to multi-core, with the latter now 
present in the vast majority of both desktop and mobile platforms. In 2012, an 
extension to JavaScript was standardized [20] which supports the creation of 
multi-threaded parallel Web Workers with message-passing. More recently, the 
committee responsible for JavaScript standardization extended the language to 
support shared memory access [10]. This extension integrates a new datatype 
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called SharedArrayBuffer which allows for concurrent memory accesses, thus 
enabling more efficient multi-threaded program interaction. 

Given a multi-threaded program that uses shared memory, there can be several 
possible valid executions of the program, given that reads and writes may concur- 
rently operate on the same shared memory and that every thread can have a differ- 
ent view of it. However, not all behaviors are allowed, and the separation between 
valid and invalid behaviors is defined by a memory model. In one common app- 
roach, memory models are specified using axioms, and the correctness of a program 
execution is determined by checking its consistency with the axioms in the mem- 
ory model. Given a set of memory operations (i.e., reads and writes) over shared 
memory, the memory model defines which combinations of written values each read 
event can observe. Because many different programs can have the same behaviors, 
the memory model is also particularly important for helping to determine the set of 
possible optimizations that a compiler can apply to a given program. As an exam- 
ple, a memory model could specify that the only allowed multi-threaded executions 
are those that are equivalent to a sequential program composed of some interleav- 
ing of the events in each thread. This model is the most stringent one and is called 
sequential consistency. With this approach, all threads observe the same total order 
of events. However, this model has significant performance limitations. In particu- 
lar, it requires all cores/processors to synchronize their local cache with each other 
in order to maintain a coherent order of the memory events. In order to overcome 
such limitations, weaker memory models have been introduced. The ECMAScript 
Memory Model is a weak model. 

Memory models are notoriously challenging to analyze with conventional test- 
ing alone, due to their non-intuitive semantics and formal axiomatic definitions. 
As aresult, formal methods are frequently used in order to verify and validate the 
correctness of memory models [4—7,18]. Some of these models apply to instruc- 
tion set architectures, whereas others apply to high-level programming languages. 
In this work, we use formal methods to validate the ECMAScript Memory Model 
and to analyze the correctness and performance of different implementations of 
ECMAScript engines. JavaScript is usually regarded as a high-level programming 
language, but its memory model is decidedly low-level and more closely matches 
that of instruction set architectures than that of other languages. The analyses that 
we provide are based on a formalization of the memory model using the Alloy lan- 
guage [12], which is then combined with a formal translation of the program to be 
analyzed in order to compute its set of valid executions. This result can then be used 
to automatically generate litmus tests that can be run on a concrete ECMAScript 
engine, allowing the developers to evaluate its correctness. The concrete execu- 
tions observed when running the ECMAScript engine can either be a subset of, be 
equivalent to, or be a superset of the valid executions. Standard litmus test analyses 
usually target the latter case (incorrect engine behavior), providing little informa- 
tion in the other cases. However, when the concrete engine’s observed executions 
are a relatively small subset of the valid executions, (e.g., 1/5 the size), this can 
indicate a missed opportunity for code optimization. As part of our work, we intro- 
duce a novel approach in such cases that is able to identify specific predicates over 
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the memory model that are always consistent with the executions of the concrete 
engine, thus providing guidance about where potential optimization opportunities 
might exist. 

The analyses proposed in this paper have been implemented in a tool 
called ECMAScript Memory Model Evaluator (EMME), which has been 
used to validate the memory model and to test the compliance of all major 
ECMAScript engines, including Google’s V8 [1], Apple’s JSC [2], and Mozilla’s 
SpiderMonkey [3]. 

The rest of the paper is organized as follows: Sect. 2 covers related work on 
formal analysis of memory models; Sect. 3 describes the ECMAScript Memory 
Model and its formal representation; Sect. 4 characterizes the analyses that are 
presented in this paper; Sect.5 provides an overview of the Alloy translation; 
Sect.6 concentrates on the tool implementation and the design choices that 
were made; Sect.7 provides an evaluation of the performance of the different 
techniques proposed in this paper; Sect.8 describes the results of the analy- 
ses performed on the ECMAScript Memory Model and several specific engine 
implementations; and Sect.9 provides concluding remarks. 


2 Related Work 


Most modern multiprocessor systems implement relaxed memory models, 
enabling them to deliver better performance when compared to more strict mod- 
els. Well known approaches such as Sequential Consistency (SC), Processor Con- 
sistency (PC), Relaxed-Memory Order (RMO), Total Store Order (TSO), and 
Partial Store Order (PSO) are mainly directed towards relaxing the constraints 
on when read and write operations can be reordered. 

The formal analysis of weak memory model hardware implementations has 
typically been done using SAT-based techniques [5,9]. In [4], a formal analysis 
based on Coq is used in order to evaluate SC, TSO, PSO, and RMO memory 
models. The DIY tool developed in [4] generates assembly programs to run 
against Power and x86 architectures. In contrast, in this work we concentrate on 
the analysis of the ECMAScript memory model, assuming the processor behavior 
is correct. 

MemSAT [19] is a formal tool, based on Alloy [12], that allows for the verifi- 
cation of axiomatic memory models. Given a program enriched with assertions, 
MemSAT finds a trace execution (if it exists) where both assertions and the 
axioms in the memory model are satisfied. 

An analysis of the C++ memory model is presented in [6]. The formalization 
is based on the LEM language [17], and the CPPMem software provides all 
possible interpretations of a C/C++ program consistent with the memory model. 
More recently, an approach based on Alloy and oriented towards synthesizing 
litmus tests is proposed in [14]. 

In this paper, we build on ideas present in MemSAT and CPPMem to build a 
tool for JavaScript. Our EMME tool can provide the set of valid executions for a 
given input JavaScript program, and it can also generate litmus tests suitable for 
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evaluating the correctness of JavaScript engine implementations. In contrast to 
previous work, we also analyze situations where the litmus tests provide correct 
results but expose a discrepancy between the number of observed behaviors in 
the implementation and what is possible given the specification. 


evı W! evo W? ev3 R? 
oOo —— ——~ 
init x = 0 | x-I8[0] = 1 ; print(x-116[0]) | 
—_—_” ———E———— EE eee eee eee x-F32 [0] 
Thread 1 Thread 2 
ev4 R? ev5 w? eve W? 


aA jA a 
ite(x-I8[0] == 1, x-I8[0] = 3, x-I8[1] = 3) 
SS. SS 


Thread 3 x-18 [0] x-116[1] 


Fig. 1. Concurrent program example Fig. 2. Shared memory views 


3 The ECMAScript Memory Model 


The objective of the ECMAScript Memory Model is to precisely define when an 
execution of a concurrent program that relies on shared memory is valid. From 
the point of view of the Memory Model, a JavaScript program can be abstracted 
as a set of threads, each of them composed of an ordered set of shared memory 
events. Each memory event has a set of attributes that specify its: operation 
(Read, Write, or ReadModify Write); ordering (SeqCst, Unordered, or Init); tear 
type (whether a single read operation can read from two different writes to the 
same location); (source or destination) memory block and address; payload value; 
and modify operation (in the case of a ReadModifyWrite). The shared memory is 
essentially an array of bytes, and a memory operation reads, writes, or modifies 
it. In these operations, the bytes can be interpreted either as signed/unsigned 
integer values or as floating point values. For instance, in Fig.2, the notation 
x-116[1] represents an access to the memory block x starting at index 1, where 
the bytes are interpreted as 16-bit signed integers (i.e., 116), while x-F32[0] 
stands for a 32-bit floating point value starting at byte 0. 

Formally, a program is defined as a set of events E and a partial order between 
them, namely the Agent Order, that encodes the thread structure. For the example 
in Fig. 1, the set of events is defined as E = {evı W!, evo W?, ev3R?, ev, R°, evs W°, 
evs W°}, with agent order AO = AO! U AO? U AO?, where AO!, AO?, and AO? 
are the agent orders for each thread: AO! = {}, AO? = {(ev2W?, ev3R?)}, and 
AO? = {(ev4R?, evs W°), (ev4R?, eve W°), (eus W°, evgW°)}. 

The execution semantics of a program is given by the Reads Bytes From 
(RBF) relation, a trinary relation which relates two events and a single byte 
index 2, with the interpretation that the first event reads the byte at index i 
which was written by the second event. Looking again at the example in Fig. 1, 
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one of the possible valid assignments to the RBF relation is {(ev,R?, ev,;W',0), 
(ev3R?, ev2W?, 0), (ev3R?, eveW®?, 1)}, meaning that the Read event ev4R? reads 
byte 0 from evı W! (taking the else branch), and ev3R? reads byte 0 from eva W? 
and 1 from evgW?. 

The combination of a (finite) set of events E = {e1,...,e,}, an agent order 
AO € E x E, and a Reads Bytes From RBF € E x E x N relation identify a 
Candidate Execution, and the purpose of the Memory Model is to partition this 
set into Valid and Invalid executions. The separation is defined as a formula that 
is satisfiable if and only if the Candidate Execution is Valid. Given a Candidate 
Execution, the Memory Model constructs a set of supporting relations in order 
to assess its validity: 


— Reads From (RF): a binary relation that generalizes RBF by dropping the 
byte location; 

— Synchronizes With (SW): the synchronization relation between sequentially 
consistent writes and reads; 

— Happens Before (HB): a partial order relation between all events; 

— Memory Order (MO): a total order relation between sequentially consistent 
events. 


Finally, a Candidate Execution is valid when the following predicates hold: 


— Coherent Reads (CR): RF and HB relations are consistent; 

— Tear Free Reads (TFR): for reads and writes for which the tear attribute is 
false, a single read event cannot read from two different write events (both of 
which are to the same memory address); 

— Sequential Consistent Atomics (SCA): the MO relation is not empty. 


3.1 Formal Representation 


The formalization of the ECMAScript Memory Model is based on the formal 
definition of a Memory Operation, shown in Definition 1. 


Definition 1 (Memory Operation). A Memory Operation is a tuple (ID, 
O, T, R, B, M, A) where: 


— ID is a unique event identifier; 

- O € {Read (R), Write (W), ReadModifyWrite (M)} is the operation; 

- T € B is the Tear attribute; 

Re {Init (1), SeqCst (SC), Unordered (U)} is the order attribute; 

- B is the name of a Shared Data Block; 

- M is a set of integers representing the memory addresses in B accessed by 
the operation O, with the requirement that M = {i € N | ByteIndex <i < 
ByteIndex + ElementSize}, for some ByteIndex, ElementSize € N 

AEB is an Activation attribute. 
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Note that this definition differs slightly from the one used in [10] (though the 
underlying semantics are the same). The differences make the model easier to 
reason about formally and include: 


— In [10], the memory address range for an operation is represented by two 
numbers, the ByteIndex and the ElementSize, whereas in Definition 1, we 
represent the memory address range explicitly as a set of bytes (which must 
contain some set of consecutive numbers, so the two representations are equiv- 
alent). This representation allows for a simpler encoding of some operators 
like computing the intersection of two address ranges. 

— Definition 1 omits the payload and modify operation attributes, as these are 
only needed to compute the concrete value(s) of the data being read or writ- 
ten. The formal model does not need to reason about such concrete values 
in order to partition candidate executions into valid and invalid ones. Fur- 
thermore, for any specific candidate execution of a JavaScript program, these 
values can be computed from the original program using the RBF relation. 

— The activation attribute A is an extension used to encode whether an event 
should be considered active based on the control flow path taken in an exe- 
cution. In particular, we model if-then-else statements by enabling or dis- 
abling the events in the then and else branches depending on the value of the 
condition. 


All relations in [10] (i.e., RBF, RF, SW, HB, and MO) are included in the 
formal model, and their semantics are defined using set operations, while the 
predicates (i.e., CR, TFR, and SCA) are expressed as formulas. The resulting 
formulation of the Memory Model, combining all constraints and predicates, is 
shown in Eq. (1). Details of our implementation of this formulation are given in 
Sect. 5. 


MM(E, AO, RF, RBF, SW, HB, MO) := yrgBr (RBF, E) ^ orr( RF, E, RBF) 
A ysw(SW, E, RF) ^ puB(HB, E, AO, SW) ^ pmo(MO, E, HB, SW) 
^ CR(E, HB, RBF) \TFR(E, RF) \ SCA(MO) (1) 


4 Formal Analyses 


The design and development of a critical (software or hardware) system often 
follows a process in which high-level requirements (such as the standards commit- 
tee’s specification of the memory model) are used to guide an actual implemen- 
tation. This process can be integrated with different formal analyses to ensure 
that the result is a faithful implementation with respect to the requirements. In 
this section, we describe the set of analyses that we used to validate the require- 
ments and implementations of the ECMAScript Memory Model. Results of our 
analyses are reported in Sect. 8. 
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4.1 Formal Requirements Validation 


The ECMAScript Memory Model defines a set of constraints which together 
make up a formula (Eq. (1)). The solutions of this formula are the valid execu- 
tions. The Memory Model also lists a number of assertions, formulas that are 
expected to be true in every valid execution (and thus must follow from the 
constraints). Complete formal requirements validation would require checking 
two things: (i) the constraints are consistent with each other, i.e. they contain 
no contradictions; and (ii) each assertion is logically entailed by the set of con- 
straints in the Memory Model. However, because we used Alloy (see Sect. 5) we 
were unable to show full logical entailment, as Alloy can only reason about a 
finite number of events. So we instead showed that for finite sets of events up to 
a certain size, (i) and (ii) hold. In future work, we plan to explore using an SMT 
solver to see if we can prove unbounded entailment in some cases. When (i) or 
(ii) do not hold, there is a bug in either the requirements or the formal modeling 
of the requirements. To help debug problems with (i), we used the unsat core 
feature of Alloy, which identifies a subset of the constraints that are inconsistent. 
To further aid debugging, we labeled each constraint c; with a Boolean activa- 
tion variable av; (i.e. we replaced c; with (av; > ci) A avi). This allowed us to 
inspect the unsat core for activation variables and immediately discern which 
constraints were active in producing the unsatisfiable result. 


4.2 Implementation Testing 


The Implementation testing phase analyzes whether a specific JavaScript engine 
correctly implements the ECMAScript Memory Model. In particular, given a 
program with shared memory operations, we generate: (1) the set of valid exe- 
cutions, (2) a litmus test, and (3) behavioral coverage constraints. 


Valid Executions. This analysis lists all of (and only) the behaviors that the 
(provided) program can exhibit that are consistent with the Memory Model 
specification. The encoding of the problem is based on the following definition: 


VE(E, AO) := {(RBF, HB, MO, SW) | 
MM(E, AO, RF, RBF, SW, HB, MO) is SAT} 


where VE(E, AO) is the complete (and finite because the program itself is 
finite) set of possible assignments to the RBF, HB, MO, and SW relations. Each 
assignment corresponds to a valid execution. 


Litmus Tests. Litmus test generation uses the generated list of valid executions 
to construct a JavaScript program enriched with an assertion that is violated if 
the output of the program does not match any of the valid executions. A litmus 
test is executed multiple times (e.g., millions), in order to increase the chance of 
exposing a problem if there is one. 

The result of running a litmus test many times can (in general) have one of 
three outcomes: the assertion is violated at least once, the assertion is not vio- 
lated and all possible executions are observed, and the assertion is not violated 
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and only some of the possible executions are observed. More specifically, given 
a program P, the set of its valid executions VE(P), and the set of concrete 
executions Ey(P) (obtained by running the JavaScript program on engine Æ 
some number of times N), the possible results can be respectively expressed as 
En(P)\VE(P) 49, En(P) = VE(P), and En(P) C VE(P). 


Behavioral Coverage Constraints. Though they can expose bugs, the lit- 
mus tests do not provide a guarantee of implementation correctness. In fact, 
even when a “bug” is found, it could be that the specification is too tight (i.e., it 
is incompatible with some intended behaviors) rather than that the implemen- 
tation wrong. On the other hand, when Ey(P) C VE(P), and especially if the 
cardinality of Ey (P) is significantly smaller than that of VE(P), it might be the 
case that the implementation is too simple: it is not taking sufficient advantage 
of the weak memory model and is therefore unnecessarily inefficient. 

Whenever Ey(P) C VE(P), this situation can be analyzed by the generation 
of Behavioral Coverage Constraints. The goal of this analysis is to synthesize the 
formulae Yogs and Yynogs, for observed and unobserved outputs, that restrict 
the behavior of the memory model in order to match Ey(P) and VE(P)\Ewn(P). 

Our approach to doing this relies on first choosing a set IT = {1,...,7} of 
predicates over which the formula will be constructed. One choice for I might 
be all atomic predicates appearing in Eq. (1). Now, let A(JZ) be the set of all 
cubes of size n over I. Formally, 


A(IT) = {h A Alyn | V1 < 1 < n. li € {mi a7; }}. 
Further, define the observed and unobserved executions as: 


EXoss =V rer, up, mo, sw)cey(p)(RBF A HBA MOASW) 
EXunoss = V Rer, m, MO, SW)¢VE(P)\Ey(P)(RBF \ HBA MOA SW) 


We compute those cubes in A(J/) that are consistent with the observed and 
unobserved executions as follows: 


dops(IT) = {8 E€ A(IT) | MM A EXoss \6 is satisfiable } 
dunoss(IL) = {6 = A(II) | MMA EXunoBs A 6 is satisfiable} 


The cubes are then combined to generate the formulae for matched and 
unmatched executions: 


Xoss= V ð Sunoss= V 6. 


d€d0Bs d€dUNOBS 


For example, let (R2H := Ve, een : RF(e1,e2) > HB(ei,e2)) E€ H bea 
predicate expressing that every tuple in Reads From is also in Happens Before. 
If the behavioral coverage constraints analysis generates Xogs = R2H and 
SunosBs = 7R2H, it means that the JavaScript engine always aligns the read 
from relation with the HB relation, thus identifying a possible path for optimiza- 
tion in order to take advantage of the (weak) memory model. 
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5 Alloy Formalization 


Alloy is a widely used modeling language that can be used to describe data struc- 
tures. The Alloy language is based on relational algebra and has been successfully 
used in many applications, including the analysis of memory models [14]. 

We used Alloy to formalize the memory model discussed in Sect. 3.1. We fol- 
lowed the formalization given in Definition 1, using sets and relations to represent 
each concept.! For instance, an operation_type is defined as an (abstract) set 
with three disjoint subsets (R for Read, W for Write, and M for ReadModify Write), 
one for each possible operation. In contrast, blocks and bytes are represented 
as sets. A memory operation is modeled as a relation which links all of the 
attributes necessary to describe a memory event. 


6.3.1.14 happens-before 


4. For each pair of events E and D in EventSet(execution): 
a. If E is agent-order before D then E happens-before D. 
b. If E synchronizes-with D then E happens-before D. 
Ce Sens 


Fig. 3. Excerpt of the Happens Before definition [10] 


The formalization of a natural language specification usually requires mul- 
tiple attempts and iterations before the intended semantics become clear. In 
the case of the ECMAScript Memory Model, this process was crucial for dis- 
ambiguating some of the stated constraints. An example is the Happens Before 
relation. Figure 3 shows an excerpt of its definition, expressing how it is related 
to the Agent Order and Synchronizes With relations. One might expect that 
the formal interpretation would be something like: V (e1, e2). (AO(e1,e2) —> 
HB(e1, €2)) A (SW(e1, e2) + HB(e1, e2)) A (...) 


1 fact hb_def {all ee,ed : mem-_events | Active2 [ee,ed] => 
(HB [ee,ed] <=> ((ee != ed) and (AO [ee,ed] or SW [ee,ed] or ... )))} 


Fig. 4. Excerpt of the Happens Before definition 


However, further analysis and discussions with the people responsi- 
ble for the Memory Model revealed that the correct interpretation is: 
V (e1, €2)-HB(e1, e2) > (AO(e1, e2) VSW(e1, e€2) V...). The Alloy formalization of 
the Happens Before relation is shown in Fig. 4. The Active2 predicate evaluates 
to true when both events are active. 


1 The complete Alloy model is available at https://github.com/FMJS/EMME/blob/ 
master /model/memory_model.als. 
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Once the Memory Model has been formalized, the next step is to combine 
it with the encoding of the program under analysis. This requires modeling the 
memory events present in each thread. In the Alloy model, each event in a 
program extends the set of memory events, and its values are defined as a series 
of facts. Figure5 shows an example of the Alloy model for the event ev;W? 
from Fig.1. A notable aspect of this example is the fact that its activation is 
dependent on the value of id1_cond which symbolically represents the condition 
of the if-then-else statement. 


1 one sig ev5.W.t3 extends mem-_events{} 

2 fact ev5-W-t3_def {(ev5-_W_t3.O0 = W) and 
(ev5.W.t3.T = NT) and 

4 (ev5_W_t3.R = U) and 
(ev5-W-t3.M = {byte_0}) and 

6 ((ev5.W.t3.A = ENABLED) <=> ((idl-cond.value = TRUE))) and 
(ev5_W-t3.B = x)} 

8 fact ev5.W_t3.in_mem-_events {ev5.W.t3 in mem-_events} 


Fig. 5. Event evs W? encoding (w.r.t. Fig. 1) 


6 Implementation 


The techniques described in this paper have been implemented in a tool called 
EMME: ECMAScript Memory Model Evaluator [15]. The tool is written in 
Python, is open source, and its usage is regulated by a modified BSD license. 
The input to EMME is a program with shared memory accesses. The tool inter- 
acts with the Alloy Analyzer [13] to perform the formal analyses described in 
Sect.4, which include the enumeration of valid executions and the generation of 
behavioral coverage constraints. 


Input Format and Encoding. The 

input format of EMME uses a simpli- 1 Var x = new SharedArrayBuffer (); 
fied JavaScript-like syntax. It supports Thread t1 { 

the definition of Read, Write, and Read- * Sa a a 
ModifyWrite events, allows events to : 

be atomic or not atomic, and supports * Pi e islo] == 1) ¢ 
operations on integer or floating point ° 9, egef > 

values. The input format also supports ” = eee = se 

if-then-else and bounded for-loop state- 
ments, as well as parametric values. An 
example of an input program is shown 
in Fig.6. The program is encoded in 
Alloy and combined with the memory 
model in order to provide the input for- 
mula for the formal analyses. 


14 } 


Fig. 6. EMME input for the program 
from Fig. 1. 


Generation of All Valid Executions. The generation of all valid executions 
is computed by using Alloy to solve the AIISAT problem. In this case, the distin- 
guishing models of the formula are the assignments to the RBF relation. Thus, 
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after each satisfiability check iteration of the Alloy Analyzer, an additional con- 
straint is added in order to block the current assignment to the RBF relation. 
This procedure is performed until the model becomes unsatisfiable. 

As described in Sect.3.1, our formal model does not encode the concrete 
values of each memory operation; thus, the extraction of a valid execution, given 
a satisfiable assignment to the formula, requires an additional step. This step is 
to reconstruct the values of each read or modify operation based on the program 
and the assignment to the RBF relation. For example, given the program in 
Fig. 1, and assuming that the RBF relation contains the tuples (ev3R?, ev W?, 0) 
and (ev3R?,evgW?,1), the reconstruction of the value read by ev3R? depends 
on the fact that evə W? writes 1 with an 8-bit integer encoding at position 0, 
while evgW? writes 3 at position 1. The composition of byte 0 and byte 1 from 
those two writes is the input for the decoding of a 16-bit integer for the event 
ev3R?, resulting in a read of the value 769. Clearly, each event could also have 
a different size and format (i.e., integer, unsigned integer, or float); thus, the 
reconstruction of the correct value must also take this into account. 

When interpreting a program containing if-then-else statements, the possible 
outcomes must be filtered to exclude executions that break the semantics of if- 
then-else. In particular, it might be the case that the Boolean condition in the 
model does not match the concrete value, given the read values. For instance, 
consider the example in Fig.6 in which the conditional is encoded as a Boolean 
variable idi_cond representing the statement x-I8[0] == 1. However, the tool 
may assign idi_cond to false even though the event x-I8[0] turns out to read 
a value different from 1 based on the information in the RBF relation. In this 
case, this execution is discarded since it is not possible given the semantics of 
the if-then-else statement. 


Graph Representation of the Results. For each valid execution, EMME will pro- 
duce a graphviz file that provides a graphical representation of the assignments 
to main relations and read values. An example of this graphical representation 
is shown in Fig. 7. The default setup removes some redundant information such 
as the explicit transitive closure of the HB relation, while RF and AO are not 
represented, and the total order MO is reported in the top right corner. Black 
arrows are used to represent the HB relation, while red and blue are respec- 
tively used for RBF and SW. Figure 7(a) represents an execution where event 
ev4_R_t3 reads value 1 from ev2_W_t2, thus executing the THEN branch in the 
if-then-else statement. In contrast, Fig. 7(b) reports an execution where it reads 
0, thus taking the ELSE branch. 


Litmus Test Generation. The generation of all valid executions also constructs a 
JavaScript litmus test that can be used to evaluate whether the engine respects 
the semantics of the Memory Model. The structure of the litmus test mirrors that 
of the input program, but the syntax follows the official TEST262 ECMAScript 
conformance standard [11]. 

To check whether a test produced a valid result, the results of memory opera- 
tions must be collected. The basic idea consists of printing the values of each read 
and collecting them all at the thread level. The main thread is then responsible 
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Memory Order Memory Order 
: Ww 1: ev1_W_tl 
evl_W_tl z ev2 W2 evl_W_tl 2: ev2_W_12 
x-init := 0 3: ev3 Re x-init := 0 3: ev3_R_t2 
4: ev4 R3 4: ev4 R t3 


5: evS_W_13 5: ev6_W_13 


ev2_W_12 
x-I8[0] := 1 


ev2_W_12 
x-I8[0] := 1 


ev4_R_3 
x-I8[0] = 1 


RBF(0] 


ev3_R_t2 
x-I16[0] = 769 


(a) Interpretation 1 (THEN) (b) Interpretation 2 (ELSE) 


ev6_W_13 
x-I8{1] :=3 
(ELSE) 


ev3_R_12 
x-I16[0] = 3 


(THEN) 


Fig. 7. Memory model interpretations of the program in Fig. 6. 


for collecting all the results. The sorted report is then compared with the set of 
expected outputs using an assertion. Moreover, the test contains a part that is 
parsed by the Litmus script, which is provided along with the EMME tool, and 
provides a list of expected outputs. The Litmus script is used to facilitate the 
execution of multiple runs of the same test, and it will provide a summary of 
the results as well as a warning whenever one of the executions observed is a not 
valid according to the standard. 


Generation of the Behavioral Coverage Constraints. As described in 
Sect. 6, for each assignment to the RBF relation, it is possible to construct a 
concrete value for each memory event. Thus, for each RBF assignment in a set 
of valid executions for a given program, we can determine the output of the cor- 
responding litmus test. Thus, running the litmus test many times on a JavaScript 
engine, it is possible to determine which assignments to the RBF relation have 
been matched. We denote these MA-rbf1,..., MA- rbfn. The unmatched assign- 
ments to RBF can also be determined simply by removing the matched ones 
from the set of all valid executions. We denote the unmatched ones UN_rbf;, 
..., UN_rbf. 

As described in Sect. 4, the generation of separation constraints that distin- 
guish between matched and unmatched executions first requires the definition of 
a set of predicates JI. The extraction of the separation constraints is based on an 
AISAT call for matched and unmatched results. The former is shown in (2), and 
consists of extracting all assignments to the predicates IT such that the models 
of the RBF relation are consistent with MA_rbf;. 


ALLSAT[MM(E, AO, RBF,...) A (E = BEn) ^ (AO = BEao) 
A( \/ RBF =MAxbf;)] (2) 


i=1,...,k 


Similarly, the evaluation for the unmatched executions performs an AISAT 
analysis for the formula reported in (3). The results of these two calls to the solver 
produce respectively the formula Yogs and Ny nogs as described in Sect. 4. 
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ALLSAT 7[MM(E, AO, RBF,...) A (E = BEg) A (40 = BEao) 
A( \/ RBF =UNxbf:)] (3) 


i=1,...,k 


The results from the two AIISAT queries can then be manipulated using 
a BDD [8] package that produces in most cases a smaller formula. After this 
step, the tool provides a set of formal comparisons that can be done between 
these two formulas such as implication, intersection, and disjunction, in order to 
understand the relation between Xopgs and Vynoss. 


7 Experimental Evaluations 


In this section, we evaluate the performance of EMME over a set of programs, 
each containing up to 8 memory events. The analyses can be reproduced using 
the package available at [16]. 


Programs Under Analysis. In this work, we rely on programs from previous 
work [6] as well as handcrafted and automatically generated programs. The 
handcrafted examples are part of the EMME [15] distribution, and they cover 
a variety of different configurations with 1 to 8 memory events, if-statements, 
for-loops, and parametric definitions. 

The programs from previous work as well as the handcrafted examples cover 
an interesting set of examples, but provide no particular guarantees on the space 
of programs that are covered. To overcome this limitation, we implemented a 
tool that enumerates all possible programs of a fixed size, thus giving us the 
possibility of generating programs to entirely cover the space of configurations, 
given a fixed set of events. 

The sizes of the programs considered in this evaluation allow us to cover a 
representative variety of possible event interactions, while preserving a reason- 
able level of readability of the results. In fact, a program with 8 memory events 
can have hundreds of valid executions that often require extensive manual effort 
to understand. 


All Valid Executions. As described in 1000 ] 
Sect. 6, the generation of all valid exe- : 
cutions is based on a single AIISAT 100, 

procedure. Figure 8 shows a scalability 
evaluation when generating all valid 
executions of 1200 program instances, 
each with from 3 to 8 memory events 1 
(200 programs for each configuration). 

The x-axis refers to the program num- 0,4 
ber, ordered first by number of mem- 

ory events, and then by increasing exe- 

cution time, while the y-axis reports Fig. 8. Generation of all valid executions 
the execution time (in seconds on an (form 3 to 8 memory events). 


10 


Time in seconds 


0 200 400 600 800 1000 1200 
Program number 
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Intel i7-6700 @ 3.4 GHz) on a logarithmic scale. The results show that the pro- 
posed approach is able to analyze programs with 7 memory events in fewer than 
10s, providing reasonable responsiveness to deal with small, but informative, 
programs. 


Behavioral Coverage Constraints. For the coverage constraints analysis, we first 
extracted a subset of the 1200 tests, considering only the ones that could pro- 
duce at least 5 different outputs. There were 288 such tests. For each test, we 
ran the JavaScript engine 500 times, and performed an analysis using 11 predi- 
cates, each of which corresponds to a sub-part of the Memory Model, as well as 
some additional formulae. During this evaluation, the average computation time 
required to perform the behavioral coverage constraints analysis was 3.25s, with 
a variance of 0.37 s. 


8 Results of the Formal Analyses 


In this Section we provide an overview of the results of the formal analyses for 
the ECMAScript Memory Model. 


Circular relations definition. In the original Memory Model, a subset of the 
relations were specified using circular definitions. More specifically, using the 
notation a — b as “the definition of a depends on b”, the loop was Synchronizes 
With — Reads From — Reads Bytes From — Happens Before — Synchro- 
nizes With. Cyclic definitions can result in vacuous constraints, and in the case 
of binary relations, this manifests as solutions with unconstrained tuples that 
belong to all relations involved in the cycle. In order to solve this problem, the 
definition of Reads Bytes From was changed so that it no longer depends on 
Happens Before. In addition, the memory model was extended with a property 
called Valid Coherent Reads that constrains the possible tuples belonging to the 
Reads Bytes From relation. 


Misalignment of the ComposeWriteEventBytes. The memory model 
defines a Reads Bytes From relation, and checks whether the tuples belonging 
to it are valid by relying on a function called Compose WriteEventBytes. Given 
a list of writes, the Compose WriteEventBytes function creates a vector of values 
associated with a read event; however, the index for each write event was not 
correct, resulting in a misalignment w.r.t. the Reads Bytes From relation. An 
additional offset was added in order to fix the problem. 


Distinct events quantification. Another problem encountered while analyz- 
ing the ECMAScript memory model was caused by a series of inconsistent con- 
straints. One example of inconsistency was in the definition of the Happens 
Before relation which prescribes that for any two events evı and eva with over- 
lapping ranges, whenever ev, is of type Init, evo should be of a different type 
(i.e., not Init). However, there was no constraint stating that ev; and evz have 
to be distinct, and certainly, whenever ev; and evz are not distinct then this 
expression is unsatisfiable. 
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A similar inconsistency was found in the definition of the Memory Order rela- 
tion. In this case, if the SW relation contains the pair (ev1, eva), and (evı, ev2) € 
HB, then the MO should contain (evı, eve). However, this is inconsistent with 
another constraint requiring that no event ev3 should exist operating on the 
same memory addresses as evz such that both (evı, ev3) € MO and (evs, eva) € 
MO. This constraint is false when evı = eva = ev3. Both the Happens Before 
and the Memory Order relations initially permitted any pairs of elements to be 
related (including two equal elements). The solution was to only allow pairs of 
distinct events in these relations. 

The definition of the Reads Bytes From relation stated that each read or 
modify event evı R is associated with a list of pairs of byte indices and write or 
modify events. The definition did not specifically preclude allowing modify events 
to read from themselves. This does not cause any particular issues at the formal 
model level, but it is not clear what the implication at the JavaScript engine 
implementation level would be. In order to resolve this issue, the definition of 
the Reads Bytes From relation was modified to allow only events that are distinct 
to be related by Reads Bytes From. 


Outputs coverage on ECMAScript engines. As described in Sect. 4, the lit- 
mus test analysis can result in three possible outcomes, e.g., E,(P)\VE(P) 49 
when the engine violates the specification, E(P) = VE(P) when the engine 
matches the specification, and E(P) C VE(P) when the engine is more restric- 
tive than the specification. Typically, such an analysis is designed to find bugs in 
the software implementation of the memory model [4,6], focusing on the first case 
(E,(P)\ VE(P) Æ Ø). However, in this project, the last case was most prevalent, 
where E,(P) is significantly smaller than VE(P). 

For instance, when we ran the 288 examples with at least 5 possible out- 
puts (from Sect. 7) 1000 times for each combination of program and JavaScript 
engine, the overall output coverage reached 75%, but for 1/6 of the examples, 
the coverage did not exceed 50%, and some were even below 15%?. 

This situation (frequently having far fewer observed behaviors than allowed 
behaviors) guided our development of alternative analyses, such as the genera- 
tion of the behavioral coverage constraints, to help developers understand the 
relationship between an engine’s implementation and the memory model specifi- 
cation. Future improvements of JavaScript engines will likely be less conservative, 
meaning that more behaviors will be covered. The tests produced in this project 
will be essential to ensure that no bugs are introduced. Currently, we are in the 
process of adapting the litmus tests so that they can be included as part of the 
official TEST 262 test suite for the ECMAScript Memory Model. 


9 Conclusion 


Extending JavaScript, the language used by nearly all web-based interfaces, to 
support shared memory operations warrants the use of extensive verification 
techniques. In this work, we have presented a tool that has been developed 


2 On an x86 machine, and with the latest version of the engines available on October 1st, 2017. 
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in order to support the design and development of the ECMAScript Memory 
Model. The formal analysis of the original specification allowed us to identify 
a number of potential issues and inconsistencies. The evaluation of the valid 
executions and litmus tests coverage analysis identified a conservative level of 
optimization in current engine implementations. This situation motivated us to 
develop a specific technique for understanding differences between the Memory 
Model specification and JavaScript engine implementations. 

Future extensions to this work will consider providing additional techniques 
to help developers improve code optimizations in JavaScript engines. Techniques 
such as the synthesis of equivalent programs, and automated value instantiation 
given a parametric program will provide additional analytical capabilities able to 
identify possible directions for code optimization. Moreover, we will also consider 
integration with other constraint solving engines in order to deal with more 
complex programs. 
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Abstract. We present an algorithm and tool to convert derivations from 
the powerful recently proposed PR proof system into the widely used 
DRAT proof system. The PR proof system allows short proofs without 
new variables for some hard problems, while the DRAT proof system is 
supported by top-tier SAT solvers. Moreover, there exist efficient, for- 
mally verified checkers of DRAT proofs. Thus our tool can be used to 
validate PR proofs using these verified checkers. Our simulation algo- 
rithm uses only one new Boolean variable and the size increase is at most 
quadratic in the size of the propositional formula and the PR proof. The 
approach is evaluated on short PR proofs of hard problems, including 
the well-known pigeon-hole and Tseitin formulas. Applying our tool to 
PR proofs of pigeon-hole formulas results in short DRAT proofs, linear 
in size with respect to the size of the input formula, which have been 
certified by a formally verified proof checker. 


1 Introduction 


Satisfiability (SAT) solvers are powerful tools for many applications in formal 
methods and artificial intelligence [3,9]. Arguably the most effective new tech- 
niques in recent years are based on inprocessing [21,25]: Interleaving preprocess- 
ing techniques and conflict-driven clause learning (CDCL) [26]. Several powerful 
inprocessing techniques, such as symmetry breaking [1,6] and blocked clause 
addition [23], do not preserve logical equivalence and cannot be expressed com- 
pactly using classical resolution proofs [30]. The RAT proof system [14] was 
designed to express such techniques succinctly and facilitate efficient proof val- 
idation. All top-tier SAT solvers support proof logging in the DRAT proof sys- 
tem [12], which extends the RAT proof system with clause deletion. 

More recently a ground-breaking paper [8] presented at TACAS’17 showed 
how to efficiently certify huge propositional proofs of unsatisfiability by proof 
checkers, which are formally verified by theorem provers, such as ACL2 [7], 
Coq [7,8], and Isabelle/HOL [24]. These developments are clearly a break- 
through in SAT solving. They allow us to have the same trust in the correctness 
of the results produced by a highly tuned state-of-the-art SAT solver as into 
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those claims deduced with proof producing theorem provers. We can now use 
SAT solvers as part of such fully trusted proof generating systems. 

On the other hand, with even more powerful proof systems we can produce 
even smaller proofs. The goal in increasing the power of proof systems is to 
cover additional not yet covered but existing reasoning techniques compactly, 
e.g., algebraic reasoning, but also to provide a framework for investigating new 
inprocessing techniques. If proofs are required, then this is a necessary condition 
for solving certain formulas faster. However it makes proof checking more chal- 
lenging. The recently proposed PR proof system [17] (best paper at CADE’17) is 
such a generalization of the RAT proof system, actually an instance of the most 
general way of defining a clausal proof system based on clause redundancy. 

There are short PR proofs without new variables for some hard formulas [17]. 
Some of them can be found automatically [18]. The PR proof system can therefore 
reveal new powerful inprocessing techniques. Short proofs for hard formulas in 
the RAT proof system likely require many new variables, making it difficult 
to find them automatically. The question whether PR proofs can efficiently be 
converted into proofs in the RAT and DRAT proof systems has been open. In 
this paper, we give a positive answer and present a conversion algorithm that in 
the worst case results in a quadratic blowup in size. Surprisingly only a single 
new Boolean variable is required to convert PR proofs into DRAT proofs. 

At this point there exists only an unverified checker to validate PR proofs, 
written in C. In order to increase the trust in the correctness of PR proofs, we 
implemented a tool, called PR2DRAT, to convert PR proofs into DRAT proofs, 
which in turn can be validated using verified proof checkers. Thanks to various 
optimizations, the size increase during conversion is rather modest on available 
PR proofs, thereby making this a useful certification approach in practice. 


Contributions 


— We show that the RAT and DRAT proof systems are as strong as the recently 
introduced PR proof system by giving an efficient simulation algorithm of PR 
proofs by RAT and DRAT proofs. 

— We implemented a proof conversion tool including various optimizations, 
which allow us to obtain linear size DRAT proofs from PR proofs for the 
well-known pigeon-hole formulas. These new DRAT proofs are significantly 
smaller than the most compact known DRAT proof for these formulas. 

— We validated short PR proofs of hard formulas by converting them into DRAT 
proofs and certified these using a formally verified proof checker. 


Structure 


After preliminaries in Sect. 2 we elaborate on clausal proof systems in Sect. 3 
also taking the idea of deletion steps into account. Then Sect. 4 describes and 
analyzes our simulation algorithm. In Sect.5 we present how to optimize our 
new algorithm for special cases followed by alternative simulation algorithms in 
Sect. 6. Experiments are presented in Sect. 7 before we conclude with Sect. 8. 
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2 Preliminaries 


Below we present the most important background concepts related to this paper. 


Propositional Logic. Propositional formulas in conjunctive normal form (CNF) 
are the focus of this paper. A literal is either a variable x (a positive literal) 
or the negation & of a variable x (a negative literal). The complementary literal 
i of a literal l is defined as 1 = 7 if 1 = x and l = x if l = Z. A clause C is 
a disjunction of literals. A formula F is a conjunction of clauses. For a literal, 
clause, or formula ¢, var(¢) denotes the variables in ¢. We treat var(¢ġ) as a 
variable if ¢ is a literal, and as a set of variables otherwise. 


Satisfiability. An assignment is a (partial) function from a set of variables to the 
truth values 1 (true) and 0 (false). An assignment is total w.r.t. a formula if it 
assigns a truth value to all variables occurring in the formula. We extend a given 
a to an assignment over literals, clauses and formulas in the natural way. Let ¢ 
be either a literal, clause or formula ¢. Then ¢ is satisfied if a(¢) = 1 and falsified 
if a(¢) = 0. Otherwise ¢ is unassigned. In particular, we have 7 is satisfied if x is 
falsified by a and vice versa. A clause is satisfied by a if it contains a literal that 
is satisfied by a and falsified if all its literals are falsified. Finally a formula is 
satisfied by a if all its clauses are satisfied by a. We often denote assignments by 
sequences of literals they satisfy. For instance, x y denotes the assignment that 
assigns 1 to x and 0 to y. For an assignment a, var(a) denotes the variables 
assigned by a. Further, a; denotes the assignment obtained from a by flipping 
the truth value of literal l assuming it is assigned. A formula is satisfiable if there 
exists an assignment that satisfies it and unsatisfiable otherwise. 


Formula Simplification. We denote the empty clause by L and by T the valid 
and always satisfied clause. A clause is a tautology if it contains a literal l and its 
negation l. Given assignment a and clause C, we define C|q = T if a satisfies 
C; otherwise, C |œ denotes the result of removing from C all the literals falsified 
by a. For a formula F, we define Fla = {Cla| C € F and Cla # T}. We say 
that an assignment a touches a clause C if var(a)N var(C) # 0. A unit clause 
is a clause with only one literal. The result of applying the unit clause rule to 
a formula F is the formula F|] where (l) is a unit clause in F. The iterated 
application of the unit clause rule to a formula, until no unit clauses are left, is 
called unit propagation. If unit propagation yields the empty clause L, we say 
that it derived a conflict. Given two clauses (lV C) and (LV D) their resolvent 
is CV D. If further D C C, self-subsuming literal elimination (SSLE) allows 
removing l from (LV C). Notice that C is the resolvent of (lV C) and (lv D). So 
an SSLE step can be seen as two operations, learning the resolvent C followed 
by the removal of (l v C), which is subsumed by C. The reverse of SSLE is 
self-subsuming literal addition (SSLA), which can add a literal l to a clause C in 
the presence of a clause (lV D) with D C C. The notion of SSLE first appeared 
in [10] and is a special case of asymmetric literal elimination (ALE), which in 
turn is the inverse of asymmetric literal addition (ALA) [16]. 

Clause C is blocked on literal | € C w.r.t. a formula F, if all resolvents of C 
and D € F with l € D are tautologies. If a clause C € F is blocked w.r.t. F, 
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C can be removed from F while preserving satisfiability. If a clause C ¢ F is 
blocked w.r.t. F, then C can be added to F while preserving satisfiability. 


Formula Relations. Two formulas are logically equivalent if they are satisfied by 
the same assignments. Two formulas are satisfiability equivalent if they are either 
both satisfiable or both unsatisfiable. Given two formulas F and F’, we denote 
by F E F’ that F implies F”, i.e., all assignments satisfying F also satisfy F”. 
Furthermore, by F +, F’ we denote that for every clause (l1 V--- Vln) € F’, 
unit propagation on F A (l1) A- -+A (In) derives a conflict. If F +, F”, we say that 
F implies F’ through unit propagation. For example, (x) A (y) Fi (a V z) A (y), 
since unit propagation of the unit clauses (7) and (Z) derives a conflict with (x), 
and unit propagation of (y) derives a conflict with (y). 


3  Clausal Proof Systems 


In this section, we introduce a formal notion of clause redundancy and demon- 
strate how it provides the basis for clausal proof systems. We start by introducing 
clause redundancy [22]: 


Definition 1. A clause C is redundant w.r.t. a formula F if F and FU {C} 
are satisfiability equivalent. 


For instance, the clause C = (a V y) is redundant w.r.t. F = (T V J) since F and 
F U{C} are satisfiability equivalent (although they are not logically equivalent). 
Since this notion of redundancy allows us to add redundant clauses to a formula 
without affecting its satisfiability, it gives rise to clausal proof systems. 


Definition 2. For n € N a derivation of a formula Fn from a formula Fo is 
a sequence of n triples (d1,Cy,w1),..-,;(dn,Cn,Wn), where each clause C; for 
1<i< nis redundant w.r.t. Fi-1 \ {Ci} with Fi = Fi—ı U{C;} if di = 0 and 
F; = F,_-1\ {Ci} if di = 1. The assignment w; acts as (arbitrary) witness of the 
redundancy of Ci w.r.t. F;-1 and we call the number n of steps also the length 
of the derivation. A derivation is a refutation of Fo if dn = 0 and Cn = L. A 
derivation is a proof of satisfaction of Fo if Fa equals the empty formula. 


If there exists such a derivation of a formula F” from a formula F, then F and F’ 
are satisfiability equivalent. Further a refutation of a formula F, as defined above, 
obviously certifies the unsatisfiability of F since any F’ containing the empty 
clause is unsatisfiable. Note that at this point these w; are still place-holders 
used in refinements, i.e., in the RAT and PR proof systems defined below, where 
these w; are witnesses for the redundancy of C; w.r.t. Fi—1. In these specialized 
proof systems this redundancy can be checked efficiently, i.e., in polynomial time 
w.r.t. the size of C;, F;—1 and wi. 
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3.1 The RAT Proof System 


The RAT proof system allows the addition of a redundant clause, which is a 
so-called resolution asymmetric tautology [21] (RAT, defined below). It can be 
efficiently checked whether a clause is a RAT. The following definition of RAT is 
equivalent to the original one in [21] based on resolvents using results from [17]. 


Definition 3. Let F be a formula, C a clause, and a the smallest assignment 
that falsifies C. Then, C is a resolution asymmetric tautology (RAT) with respect 
to F if there exists a literal l € C such that F|ati Fla;. We say that C is a 
RAT onl w.r.t. F. The empty clause L is a RAT w.r.t. F iff Fry L. 


Informally, F |a Fı F |a; means that F |a; is at least as satisfiable compared 
to F'|q. We know that a satisfies C as l € C, thus F |a; = (F AC) |a, Hence, if 
F has a satisfying assignment ( that falsifies C, which necessarily is an extension 
of a, then it also satisfies (FAC) |a; and thus there exists a satisfying assignment 
of F that satisfies C, obtained from 8 by flipping the assigned value of l. 


Example 1. Let F = (x V y) A (TV y) A(@V z) and C = (x V z). Then, a= %z 
is the smallest assignment that falsifies C. Observe that C is a RAT clause on 
literal x w.r.t. F. First, a, = xz. Now, consider F|q = (y) and F|a, = (y). 
Clearly, unit propagation on F'|q A (J) derives a conflict, thus F |a ki Fla,- 


In a RAT derivation (d1, C1, w1), .-. , (dn, Cn, wn) all d;’s are zero (additions). Let 
a; denote the smallest assignment that falsifies C; and let l; € C; be a literal on 
which C; is a RAT on l; w.r.t Fi—ı1. Each witness w; in a RAT derivation equals 
(@;)1,, which is obtained from a; by flipping the value of l;. 


3.2 The PR Proof System 


As discussed, addition of PR clauses (short for propagation-redundant clauses) 
to a formula can lead to short proofs for hard formulas without the introduction 
of new variables. Although PR as well as RAT clauses are not necessarily implied 
by the formula, their addition preserves satisfiability [17]. The intuitive reason 
for this is that the addition of a PR clause prunes the search space of possible 
assignments in such a way that there still remain assignments under which the 
formula is as satisfiable as under the pruned assignments. 


Definition 4. Let F be a formula, C a non-empty clause, and a the smallest 
assignment that falsifies C. Then, C is propagation redundant (PR) with respect 
to F if there exists an assignment w which satisfies C, such that F |a Fi F |w. 


The clause C can be seen as a constraint that “prunes” from the search space 
all assignments that extend a. Note again, that in our setting assignments are in 
general partial functions. Since F |a implies F |w, every assignment that satisfies 
F|a also satisfies F |w, meaning that F is at least as satisfiable under w as it 
is under a. Moreover, since w satisfies C, it must disagree with a on at least 
one variable. We refer to w as the witness, since it witnesses the propagation- 
redundancy of the clause. Consider the following example from [17]. 
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Example 2. Let F = (x Vy) A (€V y) A (ZV z), C = (a), and let w = xz be 
an assignment. Then, a = @ is the smallest assignment that falsifies C. Now, 
consider F'|q = (y) and F|w = (y). Clearly, unit propagation on Fla A (¥) 
derives a conflict. Thus, F |a Fı F|w and C is propagation redundant w.r.t. F. 
Notice that C is not RAT w.r.t F as (y) = Flatt Fla, = (y)(z). 


Most known types of redundant clauses are PR clauses [17], including blocked 
clauses [23], set-blocked clauses [22], resolution asymmetric tautologies, etc. 


3.3 The Power of Deletion 


The clausal proof system DRAT [29] is the de-facto standard for proofs of unsat- 
isfability (refutations) in practice. It extends RAT by allowing the deletion of 
clauses. The main purpose of clause deletion is to reduce computation cost to 
validate proofs of unsatisfiability. Note, that SAT solvers not only learn clauses, 
but also aggressively delete clauses to speed up reasoning. Integrating deletion 
information in proofs is crucial to speed up proof checking. 

In principle, while deleted clause information has to be taken into account to 
update the formula after a deletion step, one does not need to check the validity of 
clause deletion steps in order to refute a propositional formula. Simply removing 
deleted clauses during proof checking trivially preserves unsatisfiability. 

Proofs of satisfiability only exist in proof systems that allow and enforce 
valid deletion steps, because they are required to reduce a formula to the empty 
formula. In case of propositional formulas, the notion of proofs of satisfiability is 
probably not useful as a satisfying assignment can be used to certify satisfiability. 
However, for richer logics, such as quantified Boolean formulas, the proof of 
satisfiability can be exponentially smaller compared to alternatives [19,20]. 


4 Conversion Algorithm 


This section presents our main algorithm, which describes how to convert a PR 


derivation (0,C1,w1),...,(0,Cn,wn) of a formula F,, from a formula Fo into a 
DRAT derivation (d1, D1, w1), .-., (dm, Dm, wn) of Gm = Fn from Go = Fo. 


Each PR proof step adds a clause to the formula. Let Go be a copy of Fo and 
F; := F;—1 AC; for 1 <i < n. Each proof step in a DRAT proof either deletes or 
adds a clause depending on whether d; is 1 or 0 (respectively). For 1 < j < m 
we either have Gi = Gi-1 \ {D;} if d; is 1 or Gi := Gi-1 A D; if di is 0. 

Each single PR derivation step (0, C;, wi) is also a PR derivation of F; from 
F;—ı and our conversion algorithm simply translates each such PR derivation 
step separately into a DRAT derivation of F; from F;—ı. The conversion of the 
whole PR derivation is then obtained as concatenation of these individual DRAT 
derivations, which gives a DRAT derivation of Fa from Fo. We will first offer an 
informal top-down description of converting a single PR derivation step into a 
sequence of DRAT steps. 
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4.1 Top-Down 


Consider a formula F and a clause C which has PR w.r.t. F with witness w, i.e., a 
single PR derivation step. The central question addressed in this paper is how to 
construct a DRAT derivation of F \C from F. The constructed DRAT derivation 
(di, C1,w1), exile (dq, Cq, wa), (dq+1, Co41, Wq+1), MERAT (dp, Cp, Wp) of FAC from F 
consists of three parts. It also requires to introduce a (new) Boolean variable x 
that does not occur in F. 


1. Construct a DRAT derivation (d;,C),w1),..., (dg, Cq, wq) of F” from F where 
a. the clause (x V C) is a RAT on z w.r.t. F” and 
b. there exists a DRAT derivation from F” A (x V C) to FAC. 

2. In step q + 1, clause Cy41 = (a V C) is added to F”. 

3. The steps after step q + 1 transform F’ A (a VC) into FAC. 


Notice that (xVC) is blocked w.r.t. F and could therefore be added to F as a first 
step. However, it is very hard to eliminate literal x from F A (x V C). Instead, 
we transform F into F’, before the addition and reverse the transformation 
afterwards. Below we describe the details of our simulation algorithm in five 
phases of which phase (I) and (II) correspond to the transformation (part 1.) 
and phase (IV) and (V) corresponds to the reverse transformation (part 3.). 


4.2 Five Phases 


We will show a transformation of how F;+1 is derived from F; using PR step 
(0, Ci41,wi+1) into a sequence of p DRAT proof steps from G; to Gj+p such that 
Gj = F; and Gj+p = Fi41. In the description below, F' refers to Fj, C refers to 
Cy41, and w refers to w;41. Further let x be a new Boolean variable, i.e., x does 
not occur in F. We can assume that var(C) C var(F). Otherwise there exists a 
literal 1 € C and var(l) ¢ var(F). Thus C is blocked on l w.r.t. F and can be 
added to F using a single RAT step. 


(I) Add shortened copies of clauses that are reduced, but not satisfied by w. 
The first phase of the conversion algorithm extends F by adding the clauses 
(ZV D) with D € F|w \ F. The literal x does not occur in F. All clauses 
(z V D) are blocked on 7 w.r.t. F as no resolution on x is possible. We 
denote with G the formula F after these clause additions. 

(II) Weaken the clauses that are reduced and satisfied by w. 
A clause E € F is called involved if it is both reduced by w as well as 
satisfied by w. The second phase weakens all involved clauses by replacing 
E with (x V E) as follows. First, we add the implication « = w, or in 
clauses (ZV 1) with | € w. These clauses are blocked because G' does not 
contain clauses with literal x. Second, we weaken the involved clauses using 
self-subsuming literal addition (SSLA), since they all contain at least one 
l € w. Third, we remove the implication z > w. When this implication was 
added, the clauses (V1) with 1 € w were blocked on x. Now we can remove 
them, because they have RAT on l, which can be seen as follows. Consider 
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a clause containing l. If it is a weakened clauses (a V E) of E where E € F 
is satisfied by w, then x occurs in opposite phase and the resolvent is a 
tautology (same condition as for blocked clauses). Otherwise the resolvent 
on | of (TV L) with the clause containing / is subsumed by a clause (Z V D) 
with D € F |w \ F added in first step above. The resulting formula, where 
all involved clauses in G are weakened, is denoted by G“@!), 

Add the weakened PR clause. 

Add the clause (2 V C) to GOD, resulting in G“". The key observation 
related to this phase is that (a V C) has RAT on z w.r.t. GD: The only 
clauses in G“ that contain literal Z are the ones that were added in the 
first phase. We need to show that G“ implies every clause (x V C V D) 
with D € F|w \ F by unit propagation. Let œa be the smallest assignment 
that falsifies C. Since C has PR w.r.t. F using witness w, we know that 
F|a Fı D with D € F\w\ F. This is equivalent to F Fy (C V D) with 
D € F|w\ F. Furthermore GM |z D F. Hence, G™ |g Fy (C V D) or 
equivalently, G™® F; (æ V C V D). 

Strengthen all weakened clauses. 

The fourth phase removes all occurrences of the literal x from clauses in 
GD, thereby reversing the second phase and strengthening (x V C) to 
C. This phase consists of three parts. First, we reintroduce the implication 
x => w, or in clauses (T V l) with l € w. These clauses have RAT on l 
w.r.t. G“) by the same reasoning used to remove them in the second phase 
above and in case (% V l) can be resolved on l with the only clause (x V C) 
added in the third phase, thus / € C, the resolvent is a tautology (contains 
a and 7). Afterwards, we strengthen all clauses (x V E) € GĦ to E as 
follows. Note that this also strengthens clause (a V C) to C. Observe that 
all clauses (xV Æ) € G™ including (xV C) are satisfied by w and therefore 
there exists a clause (© V l) with | € E. Self-subsuming literal elimination 
(SSLE) can now eliminate all literals x. Finally, the implication x => w is no 
longer required. The clauses (% V l) with l € w added twice already can be 
removed again since literal has become pure due to the strengthening of 
all clauses containing literal x. The resulting formula obtained from G (™® 
by removing all occurrences of literal x is denoted by GY). 

Remove the shortened copies. 

The fifth phase reverses the first phase, and actually uses the same argu- 
ment as the fourth phase. All clauses in G@! that contained a literal x 
were strengthened by removing these literals in phase four. As a conse- 
quence, the literal % is (still) pure in GY). The only clauses that still 
contain literal % are exactly the clauses that have been added in the first 
phase. Since they are all blocked on 7 w.r.t. GY), they can be eliminated, 
while preserving satisfiability. After removing these clauses we obtain G V? 
which equals F AC. 


What a Difference a Variable Makes 83 


4.3 Complexity 


In this section we analyze the worst case complexity of converting a PR deriva- 
tion (0, C1,w1),..., (0, Cn, wn) of a formula F, from a formula Fo into a DRAT 
derivation (d,,D ,w}),.--, (d1, Dm, Wn) of Gm = Fn from Go = Fo using the 
presented simulation algorithm. The number of DRAT steps that are required 
to simulate a single PR addition step depends on the size of the formula. Let 
N = |F;,| be the number of clauses in the last Fa and V = |var(F;,)| the num- 
ber of its variables. Since a PR derivation does not remove clauses, we have 
|F;| = |Fj;-1] + 1 and |var(F;)| > |var(F;_1)|. Therefore for i € {1..n}, |F;| < N 
and |var(F;)| < V. In the analysis we ignore clause deletion, since the number 
of clause deletions is bounded by the number of added clauses. 

In phase (I) of the conversion algorithm, copies of clauses that are reduced 
but not satisfied by w; are added, while phase (II) clauses are weakened which 
are reduced and satisfied by w;. Since a clause is either satisfied, not satisfied, 
or untouched by wi, the sum of the number of copies and weakened clauses is 
at most |F;| < N. Also the implication z = w; is added in phase (II), meaning 
at most |var(w;)| < |var(F;)| < V clause addition steps. Phase (III) adds a 
single clause. Phase (IV) adds again the implication x = w; (at most V steps) 
and strengthens all weakened clauses (at most N steps). Phase (V) only deletes 
clauses. Thus the total number of clause additions for all phases in the conversion 
of a single PR step is bounded by 2V + 2N +1. 

There are n < N additions in the PR proof and for each addition we apply 
the conversion algorithm. Hence the total number of clause addition steps in the 
DRAT derivation is at most 2NV +2N? +N. Since V < N for any interesting PR 
derivation, the number of steps in the resulting DRAT derivation is in O(N7?). 


5 Optimizations 


The simulation algorithm described in the prior section was designed to result 
in compact DRAT derivations using a single new variable, while focussing on 
converting any PR derivation into a DRAT derivation. The algorithm can be 
further optimized to reduce the size of the resulting DRAT derivations. 


5.1 Refutations 


In practice, most PR derivations are refutations, i.e., they include adding the 
empty clause. When converting PR refutations, one can ignore the justification 
of any weakening steps as such steps trivially preserve unsatisfiability. The only 
weakening steps in the simulation algorithm are performed in phase (II). The 
purpose of the addition of the implication z = w in phase (II) is to allow the 
weakening via self-subsuming literal addition (SSLA). This justification is no 
longer required for PR refutations. Without the addition of x > w, one can also 
discard its removal. So both the first and third part of phase (II) can be omitted. 
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5.2 Witness Minimization 


In some situations, only a subset of the involved clauses needs to be weakened 
(phase (II)) and later strengthened (phase (IV)). Weakening of involved clauses 
is required to make sure that the clauses (% V l) with | € w are RAT on l 
w.r.t. GOD in phase (IV) of the simulation algorithm. However, some of the 
clauses (V1) may be unit implied by others (and do not require to be a RAT on 
1). This situation occurs when a subset of the witness implies the full witness via 
unit propagation. We minimize the witness by searching for the smallest witness 
w’ Cw such that w’ implies w via unit propagation. Only clauses reduced by w’ 
and satisfied by w need to be weakened in phase (II) and strengthened in (IV). 


5.3 Avoiding Copying 


In some quite specific case, one can avoid copying the clauses that are reduced, 
but not satisfied by the witness altogether. In other words skip phase (I) and 
(V) of the simulation algorithm. This case, however, occurred frequently in our 
PR proofs. Let a denote the smallest assignment that falsifies the PR clause C 
to be added. Furthermore, let w be the witness and w’ the minimized witness as 
discussed above. The condition for avoiding clause copying consists of two parts. 
First, there is no literal | € a such that | € w’. Recall that there always exists a 
literal 1 € a such that l € w. So witness minimization is necessary. Second, for 
each literal | € w’, the unit clause (J) should be a RAT on l w.r.t. the current 
formula without the involved clauses under a. Although both conditions are very 
restrictive, they apply often in the PR proofs used in the evaluation. 

Basically, this optimization removes phases (I) and (V), and modifies (II), 
(III), and (IV). The modified phases are named phase (i), (ii), and (iii), resp. 


(i) Weaken the clauses that are reduced by w’ and satisfied by w. 
Clause E € F is called involved if it is reduced by the reduced witness w’ 
and satisfied by the original w. The first phase weakens all involved clauses 
E to (x V E) as follows. First, we add the implication x => w’ Ua, or in 
clauses (© V l) with | € w Ua. These clauses are blocked because G does 
not contain clauses with literal x. Now we can weaken the involved clauses 
using SSLA. Then we remove the implication part x > w’, but keep x > a. 
When adding this implication, the clauses (z V l) with | € w’ were blocked 
on x. Now we can remove them, because they have RAT on / as all clauses 
containing | have been either weakened (if they were satisfied by w) or are 
implied by a by the second condition. The resulting formula, G in which 
all involved clauses are weakened and includes x => a, is denoted by G“. 

(ii) Add the weakened PR clause. 
Add the clause (x V C), which is equivalent to the implication x < a, to 
GÖ, resulting in G). The only clauses containing literal 7 are the ones 
that originate from x = a. As a consequence, (x V C) is blocked on x 
wrt. GO, 
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(iii) Strengthen all weakened clauses. 
The third phase removes all occurrences of the literal x from clauses in 
GC, thereby reversing the second phase and strengthening (x V C) to C. 
This phase consists of four parts. First, we reintroduce the implication part 
x = w’, or in clauses (% V l) with | € w’. Again, these clauses have RAT 
on | w.r.t. G“), Second, we remove the implication part x > a, ie. the 
clauses (© V l) with | € a. Afterwards, we strengthen (x V C) to C and 
all clauses (x V E) € G“) to E. Observe that all clauses (x V E) € GÐ 
including (x V C) are satisfied by w and therefore there exists a clause 
(zV 1) with | € E. SSLE can therefore remove all literals x. Finally, the 
implication z = w’ is no longer required. The clauses (% V1) with l € w can 
be eliminated because literal has become pure due to the strengthening 
of all clauses containing literal x. The resulting formula, i.e., G“) after 
removing all occurrences of literal x, is denoted by G“) and equals G A C. 


In case the PR derivation is a refutation, we can further optimize this case, 
by changing phase (i) as follows: Instead of adding the implication z > w’ Ua, 
the implication x = a is added. Without the addition of the implication part 
x = w', we can also discard removing that part at the end of phase (i). 


6 Alternative Simulation Algorithms 


Even though the conversion from PR derivations to DRAT derivations is arguably 
the most useful one in practice, one can also consider the following alternatives. 


6.1 Limiting the Number of RAT Steps 


Most steps in the simulation algorithm are “basic” steps, i.e., self-subsuming lit- 
eral addition or elimination and blocked clause addition or elimination. There are 
only few “full” RAT addition steps: The removal of the implication in phase (II), 
the addition of the weakened PR clause in phase (III) and the addition of the 
implication in phase (IV). It is interesting to explore the option to reduce the 
number of these “full” RAT addition steps. Eliminating “full” RAT addition steps 
brings us close to a simulation algorithm with only basic steps. 

It is easy to eliminate all but one “full” RAT addition step. In order to elimi- 
nate the RAT steps in phase (II), one can weaken the clauses (i.e., add a literal x 
using SSLA) that are reduced but not satisfied by the witness using the shortened 
copies of clauses that are reduced, but not satisfied by w. After the weakening, 
we can remove the implication x > w using blocked clause elimination (instead 
of RAT), because now all clauses that are touched by w have a literal x. Therefore 
all clauses (© V l) with | € w are blocked on l. The weakening also allows adding 
the implication x => w in phase (IV) using blocked clause addition steps (instead 
of RAT). The strengthening of the newly weakened clause can be performed in 
phase (IV) using SSLE (after adding the implication). It is not obvious how to 
replace the only remaining RAT addition in phase (III) using basic steps. 
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6.2 Converting DPR Proofs into DRAT Proofs 


So far we only considered converting a PR clause addition as a sequence of 
DRAT steps and ignored deletion of PR clauses from a formula. In most cases, 
clause deletion steps in a proof facilitate more efficient checking of a proof of 
unsatisfiability and can therefore be deleted without any checking. However, 
there are situations in which one wants to check the validity of clause deletion 
steps. In particular for proofs of satisfiability, i.e., a sequence of proof steps that 
show that a given formula is equivalent to the empty formula and thus satisfiable. 

The DPR proof system is a clausal proof system that allows the addition 
and deletion of PR clauses. Conversion of a PR clause addition step into DRAT 
proof steps is equivalent to the conversion of such a step in the PR proof system. 
The conversion of a PR clause deletion step is slightly different. Given a formula 
F and a clause C € F, which is a PR clause w.r.t. F with witness w. The first 
phase of the conversion is exactly the same as phase (I) of the PR clause addition 
conversion. The second phase of the conversion is slightly different compared to 
phase (II) of the PR clause addition conversion: Instead of weakening all clauses 
reduced and satisfied by w, we weaken all clauses satisfied by w. Notice that this 
includes weakening C to («VC). The third phase consists of deleting (x VC) from 
the current formula. Recall that phase (III) of the PR clause addition conversion 
added (a V C). The final phase corresponds to phases (IV) and (V). 


6.3 Converting PR Refutations into RAT Refutations 


The presented simulation algorithm converts PR derivations into DRAT deriva- 
tions. We selected the DRAT proof system as target, because it is the most 
widely-supported proof system by top-tier SAT solvers and it allows step-wise 
simulation using deletion steps. The question arises whether deletion steps are 
required when converting a PR refutation. In short, the answer is no when allow- 
ing the introduction of arbitrary many new Boolean variables. Converting a dele- 
tion step can be realized as follows. Let C be the clause that is deleted from a 
formula F. For each x € var(C), add to F the equivalence x’ = x with x’ being 
a new variable. Afterwards, copy all clauses in F —apart from C— that contain 
at least one literal 1 with var(l) € var(C) using the new x’ variables instead of 
the old x variables. Finally replace all occurrences of old literals x and 7 in the 
remaining proof by literals x’ and 7’, respectively. 

In order to limit the number of copy operations, one can group (consecu- 
tive) deletion steps and use the same variables x’ for the group. The simulation 
algorithm can be partitioned into two groups of (consecutive) clause addition 
steps that are followed each by groups of consecutive clause deletion steps: The 
first group of addition steps consists of phase (I) and the first half of phase (II), 
i.e., adding the implication « = w and the weakened involved clauses. The first 
group of deletion steps consists of the remaining part of phase (II), i.e., dele- 
tion of the involved clauses and deletion of the implication z > w. The second 
group of consecutive addition steps consists of phase (III) and the first half of 
phase (IV), i.e, adding the implication « = w and adding back the involved 
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clauses. The second group of consecutive deletion steps consists of the remain- 
ing part of phase (IV), i.e., removal of the weakened involved clauses and the 
implication x = w, and phase (V). By grouping the deletion steps, one can con- 
vert PR refutations into RAT refutations with at most a quadratic blowup, so the 
same worst case complexity as converting PR derivations into DRAT derivations. 


7 Evaluation 


We implemented a tool, called PR2DRAT, to convert PR proofs into DRAT proofs! 
and evaluated the tool on short PR proofs for hard formulas from three families: 


(1) pigeon-hole, (2) two-pigeons-per-hole [2], and (3) Tseitin formulas [4,27]. 


Every resolution proof of a formula in these families is exponential in the size 
of the formula [11,28]. As a consequence, any CDCL solver without dedicated 
special reasoning techniques, such as cardinality or XOR reasoning, is unable to 
solve these benchmarks in reasonable time. In contrast, our PR proofs are smaller 
than the formulas, so linear in size. The PR proofs of the pigeon-hole formu- 
las and two-pigeons-per-hole formulas have been constructed manually in earlier 
work [17]. The proofs of the Tseitin formulas have been manually constructed by 
expressing Gaussian elimination in the PR system. Applying Gaussian elimina- 
tion —after syntactically extracting XOR constraints from the CNF formulas— 
is enough to solve these formulas. We will first evaluate the size of the conver- 
sion. Afterwards we certify for the first time the short PR proofs by converting 
them into DRAT proofs which are checked by a formally verified checker. 


7.1 Proof Simulation and Optimization 


We will compare three kinds of DRAT proofs for the benchmarks used in the 
experiments: the most compact existing ones [14,15], the proofs obtained from 
using our plain conversion algorithm, and the proofs obtained from our optimized 
algorithm. The most compact existing ones originate from expressing symmetry 
breaking as DRAT proof steps. Table 1 shows the comparison. All proofs have 
been trimmed using the DRAT-trim tool [12] once. Applying DRAT-trim multiple 
rounds (using the output proof as input proof for the next round) allows further 
reduction of the proof size, but typically these extra reductions are small. 

For pigeon-hole formulas over n pigeons, the most compact existing proofs 
have O(n*) proof steps. This is also the case for the DRAT proofs obtained 
through our basic conversion algorithm as well as for the extended resolution 
proofs by Cook [5]. However, DRAT proofs obtained with our optimized algo- 
rithm have only O(n?) proof steps. Notice that the size of pigeon-hole formulas 
as well as the size of PR proofs are both in O(n). In other words, our optimized 
conversion algorithm cannot only produce DRAT proofs, but for pigeon-hole for- 
mulas it generates the first DRAT proofs of linear size. 


1 The tool, checkers, formulas, and proofs discussed in this section are available at 
http: //www.cs.utexas.edu/~marijn/pr2drat /. 
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Table 1. Comparison of the size of trimmed, generated DRAT proofs for hard formulas. 
The size of proofs is measured in the number of clause addition steps (#tadd). We denote 


with “—” that no DRAT proof is available. Bold is used for the smallest DRAT proofs. 
input PR proofs DRAT proofs (#add) 

formula #var cls #add [existing [14,15] plain optimized 
hole20 420 4,221 2,870 49,410 94,901 26,547 
hole30 930 13,981 9,455 234,195 422,101 89,827 
hole40 1,640 32,841 22,140 715,030 1,241,126 213,107 
hole50 2,550 63,801 42,925 1,708,915 2,893,476 416,387 
tph8 136 5,457 1,156 253,958 86,216 25,204 
tph12 300 27,625 3,950 1,966,472 612,108 127,296 
tph16 528 87,329 9,416 — 2,490,672 401,004 
tph20 820 213,241) 18,450 — 7,440,692 976,376 
Urquhart-s5-b1 106 714 620 — 30,235 28,189 
Urquhart-s5-b2 107 742 606 — 34,535 32,574 
Urquhart-s5-b3 121 1,116 692 = 44,117 41,230 
Urquhart-s5-b4 114 888 636 = 40,598 37,978 


The results for the two-pigeons-per-hole formulas are similar, but more pro- 
nounced: There exist only DRAT proofs of the formulas up to 12 holes and 25 
pigeons (tph12) [15]. Our plain simulation algorithm can produce DRAT proofs 
of the formulas up to 20 holes and 41 pigeons (tph20). Moreover, our optimized 
simulation algorithm is able to produce proofs that are linear in size of the 
formulas, although not linear in the size of the PR proofs. 

We are unaware of any DRAT proofs of hard Tseitin formulas, e.g., from the 
Urquhart-s5-b* family [4], nor of any tool able to produce such DRAT proofs. 
However, we succeeded to manually produce short PR proofs without new vari- 
ables for these formulas and convert them into DRAT proofs. The resulting DRAT 
proofs, with and without optimizations, are relatively large compared to the 
PR proofs. The blowup is close to the quadratic worse case. We observed that 
DRAT-trim was able to remove many (around 70%) of clause additions, which 
suggests that there could be an optimization to generate shorter DRAT proofs. 


7.2 Verified PR Proof Checking 


Our proof simulation approach can be used to validate PR proofs with formally 
verified tools and thereby increasing the confidence in their correctness. The tool 
chain works as follows: Given a formula F and an alleged PR proof Ppr of F, 
our tool PR2DRAT converts Ppr into a DRAT proof Porat. Afterwards, we use the 
DRAT-trim tool to convert Pprat into a CLRAT (compressed linear RAT) proof 
Pcrtrat. CLRAT proofs can be efficiently checked using formally verified check- 
ers [7]. We used the verified checker ACL2check [13] to certify that Pcirat is a 
valid proof of unsatisfiability of F. Notice that the tools PR2DRAT and DRAT-trim 
are unverified and thus may turn an invalid proof into a valid proof or vice versa. 

Figure | shows the results of applying this tool chain on the benchmark suite. 
The PR2DRAT tool was able to convert each PR proof into a DRAT proof in less 
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Fig. 1. Certification of PR proofs using PR2DRAT, DRAT-trim, and the formally verified 
checker ACL2check. Left the sizes of proofs in the PR, DRAT, and CLRAT formats are 
shown in bytes and right the proof conversion and checking times are in seconds. No 
times are shown for the Urquhart instances as all times were less than a second. 


than a minute and half of the proofs in less than a second. The runtimes of 
DRAT-trim and ACL2check are one to two orders of magnitude higher than for 
PR2DRAT. Thus our tool adds little overhead to the tool chain. The sizes of 
the DRAT and CLRAT proofs are comparable. However, these proofs are differ- 
ent: DRAT-trim (A) removes redundant clause additions; (B) includes hints to 
speedup verified checking; (C) compresses proofs. The effect of (A) depends on 
proof quality; (B) increases the size of proofs of small hard problems by roughly a 
factor of four; (C) reduces size to 30% of the uncompressed proofs. The difference 
between the DRAT and CLRAT proofs therefore indicate how much redundancy 
was removed: for pigeon-hole proofs hardly anything, for two-pigeons-per-hole 
proofs a modest amount, and for Tseitin proofs a lot. Notice that runtimes of the 
verified checker ACL2check are comparable to the C-based checker DRAT-trim. 


8 Conclusions and Future Work 


We showed how to convert PR proofs into DRAT proofs using only a single new 
variable with an at most quadratic blowup in proof size. This result suggests 
that it might also be possible to construct DRAT proofs without new variables 
using one variable elimination step and reusing the eliminated variable. The 
optimizations implemented in our conversion tool PR2DRAT made it possible to 
produce DRAT proofs for hard problems that are significantly smaller compared 
to existing DRAT proofs of those problems. The main open question is whether 
PR proofs can be converted into RAT proofs (i.e., not allowing the deletion 
steps) with a small number of new variables. Without deletion steps, it seems 
that copying the formula using new variables is required. 
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Our new tool chain for certifying SAT solving results using PR proofs consists 


of four steps: proof production (solving), conversion from PR to DRAT, conver- 
sion from DRAT to CLRAT, and validation of the CLRAT proof using a formally 
verified checker. In order to fasten adaptation of the approach, we are exploring 
elimination of the second step, by integrating the conversion algorithm in either 
SAT solvers or in DRAT proof checkers. 
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Abstract. Alternating automata have been widely used to model and 
verify systems that handle data from finite domains, such as communica- 
tion protocols or hardware. The main advantage of the alternating model 
of computation is that complementation is possible in linear time, thus 
allowing to concisely encode trace inclusion problems that occur often in 
verification. In this paper we consider alternating automata over infinite 
alphabets, whose transition rules are formulae in a combined theory of 
Booleans and some infinite data domain, that relate past and current val- 
ues of the data variables. The data theory is not fixed, but rather it is a 
parameter of the class. We show that union, intersection and complemen- 
tation are possible in linear time in this model and, though the empti- 
ness problem is undecidable, we provide two efficient semi-algorithms, 
inspired by two state-of-the-art abstraction refinement model checking 
methods: lazy predicate abstraction [8] and the IMPACT semi-algorithm 
[17]. We have implemented both methods and report the results of an 
experimental comparison. 


1 Introduction 


The language inclusion problem is recognized as being central to verification of 
hardware, communication protocols and software systems. A property is a spec- 
ification of the correct executions of a system, given as a set P of executions, 
and the verification problem asks if the set S of executions of the system under 
consideration is contained within P. This problem is at the core of widespread 
verification techniques, such as automata-theoretic model checking [23], where 
systems are specified as finite-state automata and properties defined using Linear 
Temporal Logic [21]. However the bottleneck of this and other related verifica- 
tion techniques is the intractability of language inclusion (PSPACE-complete for 
finite-state automata over finite alphabets). 

Alternation [3] was introduced as a generalization of nondeterminism, intro- 
ducing universal, in addition to existential transitions. For automata over finite 
alphabets, the language inclusion problem can be encoded as the emptiness 
problem of an alternating automaton of linear size. Moreover, efficient explo- 
ration techniques based on antichains are shown to perform well for alternating 
automata over finite alphabets [5]. 
© The Author(s) 2018 
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Using finite alphabets for the specification of properties and models is how- 
ever very restrictive, when dealing with real-life computer systems, mostly 
because of the following reasons. On one hand, programs handle data from very 
large domains, that can be assumed to be infinite (64-bit integers, floating point 
numbers, strings of characters, etc.) and their correctness must be specified in 
terms of the data values. On the other hand, systems must respond to strict 
deadlines, which requires temporal specifications as timed languages [1]. 

Although being convenient specification tools, automata over infinite alpha- 
bets lack the decidability properties ensured by finite alphabets. In general, 
when considering infinite data as part of the input alphabet, language inclusion 
is undecidable and, even complementation becomes impossible, for instance, for 
timed automata |1] or finite-memory register automata [13]. One can recover 
theoretical decidability, by restricting the number of variables (clocks) in timed 
automata to one [20], or forbidding relations between current and past/future 
values, as with symbolic automata [24]. In such cases, also the emptiness problem 
for the alternating versions becomes decidable [4, 14]. 

In this paper, we present a new model of alternating automata over infinite 
alphabets consisting of pairs (a, v) where a is an input event from a finite set and 
v is a valuation of a finite set x of variables that range over an infinite domain. 
We assume that, at all times, the successive values taken by the variables in 
x are an observable part of the language, in other words, there are no hidden 
variables in our model. The transition rules are specified by a set of formulae, 
in a combined first-order theory of Boolean control states and data, that relate 
past with present values of the variables. We do not fix the data theory a priori, 
but rather consider it to be a parameter of the class. 

A run over an input word (a1,%4)...(@n,;U¥n) is a sequence ¢o(xo) => 
$i(Xo,X1) => .-. => on(Xo,---,Xn) of rewritings of the initial formula by 
substituting Boolean states with time-stamped transition rules. The word is 
accepted if the final formula ¢,,(xo,...,Xn) holds, when all time-stamped vari- 
ables x1,...,X», are substituted by their values in 11,...,V,,, all non-final states 
replaced by false and all final states by true. 

The Boolean operations of union, intersection and complement can be imple- 
mented in linear time in this model, thus matching the complexity of per- 
forming these operations in the finite-alphabet case. The price to be paid is 
that emptiness becomes undecidable, for which reason we provide two efficient 
semi-algorithms for emptiness, based on lazy predicate abstraction [8] and the 
IMPACT method [17]. These algorithms are proven to terminate and return a 
word from the language of the automaton, if one exists, but termination is not 
guaranteed when the language is empty. 

We have implemented the Boolean operations and emptiness checking semi- 
algorithms and carried out experiments with examples taken from array log- 
ics [2], timed automata [9], communication protocols [25] and hardware verifica- 
tion [22]. 


Related Work. Data languages and automata have been defined previously, 
in a classical nondeterministic setting. For instance, Kaminski and Francez [13] 
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consider languages, over an infinite alphabet of data, recognized by automata 
with a finite number of registers, that store the input data and compare it 
using equality. Just as the timed languages recognized by timed automata [1], 
these languages, called quasi-regular, are not closed under complement, but their 
emptiness is decidable. The impossibility of complementation here is caused 
by the use of hidden variables, which we do not allow. Emptiness is however 
undecidable in our case, mainly because counting (incrementing and comparing 
to a constant) data values is allowed, in many data theories. 

Another related model is that of predicate automata [6], which recognize 
languages over integer data by labeling the words with conjunctions of uninter- 
preted predicates. We intend to explore further the connection with our model 
of alternating data automata, in order to apply our method to the verification 
of parallel programs. 

The model presented in this paper stems from the language inclusion problem 
considered in [11]. There we provide a semi-algorithm for inclusion of data lan- 
guages, based on an exponential determinization procedure and an abstraction 
refinement loop using lazy predicate abstraction [8]. In this work we consider 
the full model of alternation and rely entirely on the ability of SMT solvers to 
produce interpolants in the combined theory of Booleans and data. Since deter- 
minisation is not needed and complementation is possible in linear time, the 
bulk of the work is carried out by the solver. 

The emptiness check for alternating data automata adapts similar semi- 
algorithms for nondeterministic infinite-state programs to the alternating model 
of computation. In particular, we considered the state-of-the-art IMPACT pro- 
cedure [17] that is shown to outperform lazy predicate abstraction [8] in the 
nondeterministic case, and generalized it to cope with alternation. More recent 
approaches for interpolant-based abstraction refinement target Horn systems 
[10,18], used to encode recursive and concurrent programs [7]. However, the 
emptiness of alternating word automata cannot be directly encoded using Horn 
clauses, because all the branches of the computation synchronize on the same 
input, which cannot be encoded by a finite number of local (equality) constraints. 
We believe that the lazy annotation techniques for Horn clauses are suited for 
branching computations, which we intend to consider in a future tree automata 
setting. 


2 Preliminaries 


A signature S = (S8,S‘) consists of a set Së of sort symbols and a set SË of 
sorted function symbols. To simplify the presentation, we assume w.l.o.g. that 
S5 = {Data, Bool}! and each function symbol f € Sf has #(f) > 0 arguments 
of sort Data and return value o(f) € S*. If #(f) = 0 then f is a constant. We 
consider constants T and L of sort Bool. 


1 The generalization to more than two sorts is without difficulty, but would unneces- 
sarily clutter the technical presentation. 
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Let Var be an infinite countable set of variables, where each x € Var has 
an associated sort a(x). A term t of sort o(t) = S is a variable x € Var where 
a(x) = S, or f(ti,...,tycp)) where tı,...,ty(f) are terms of sort Data and 
o(f) = S. An atom is a term of sort Bool or an equality t ~ s between two terms 
of sort Data. A formula is an existentially quantified combination of atoms using 
disjunction V, conjunction A and negation — and we write @ — w for nọ V wv. 

We denote by FV (¢) the set of free variables of sort ø in ¢ and write FV (¢) 
for U sess FV" (¢). For a variable x € FV (¢) and a term t such that o(t) = a(x), 
let ¢[t/x] be the result of replacing each occurrence of x by t. For indexed sets 
t = {t1,...,t,} and x = {x,...,an}, we write ¢[t/x] for the formula obtained 
by simultaneously replacing x; with t; in ¢, for all i € [1,n]. The size |¢ġ| is the 
number of symbols occuring in ¢. 

An interpretation I maps (1) the sort Data into a non-empty set Data’, (2) 
the sort Bool into the set B = {true, false}, where T? = true, |/ = false, 
and (3) each function symbol f into a total function f7 : (Data’)#) — o(f), 
or an element of o( f)! when #(f) = 0. Given an interpretation J, a valuation 
v maps each variable x € Var into an element v(x) € a(x)’. For a term t, 
we denote by ¢Z the value obtained by replacing each function symbol f by its 
interpretation f4 and each variable x by its valuation v(x). For a formula ¢, we 
write J,v |= ¢ if the formula obtained by replacing each term t in ¢ by the value 
tł is logically equivalent to true. 

A formula ¢ is satisfiable in the interpretation J if there exists a valuation v 
such that J,v = ¢, and valid if T, v } ¢ for all valuations v. The theory T(S, T) 
is the set of valid formulae written in the signature S, with the interpretation J. 
A decision procedure for T(S,Z) is an algorithm that takes a formula ¢ in the 
signature S and returns yes iff ¢ € T(S, 7). 

Given formulae y and w, we say that ¢ entails p, denoted @ KE? y iff T, v Ey 
implies J,v / y, for each valuation v, and ¢ 7 4 iff dK’ w and y H7 ¢. We 
omit mentioning the interpretation J when it is clear from the context. 


3 Alternating Data Automata 


In the rest of this section we fix an interpretation J and a finite alphabet X of 
input events. Given a finite set x C Var of variables of sort Data, let x t+ Data” 


be the set of valuations of the variables x and X|x] = X x (x + Data’) be 
the set of data symbols. A data word (word in the sequel) is a finite sequence 
(a1, 11) (a2, V2)... (an, Vn) of data symbols, where a1,...,@n E€ X and ,...,Un: 


x — Data’ are valuations. We denote by £ the empty sequence, by X* the set 
of finite sequences of input events and by [x]* the set of data words over x. 
This definition generalizes the classical notion of words from a finite alphabet 
to the possibly infinite alphabet [x]. Clearly, when Data’ is sufficiently large 
or infinite, we can map the elements of X into designated elements of Data’ and 
use a special variable to encode the input events. However, keeping X explicit 
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in the following simplifies several technical points below, without cluttering the 
presentation. 

Given sets of variables b,x C Var of sort Bool and Data, respectively, we 
denote by Form(b,x) the set of formulae ¢ such that FV®°°(¢) C b and 
FV?2(¢) C x. By Form” (b,x) we denote the set of formulae from Form(b, x) 
in which each Boolean variable occurs under an even number of negations. 

An alternating data automaton (ADA or automaton in the sequel) is a tuple 
A = (x, Q,0, F, A), where: 


— x C Var is a finite set of variables of sort Data, 

— QC Var is a finite set of variables of sort Bool (states), 

— 1 € Formt(Q,9) is the initial configuration, 

— F C Q is a set of final states, and 

- A: Qx X = Form*(Q,X U x) is a transition function, where X denotes 
{z|rex} 


In each formula A(q,a) describing a transition rule, the variables X track 
the previous and x the current values of the variables of A. Observe that 
the initial values of the variables are left unconstrained, as the initial con- 
figuration does not contain free data variables. The size of A is defined as 


A| = |l + D(q,a)eQxE |A(q, a)|. 


A(qo,4) = qi Nga Axx0Ay~0 
A(q,.€) = qi Aq Axxy+l^Ayxx+1 
A(qi,b) = q3 AX2Y 

Alq, a) = qa Ax>XAY>Y 

A(q2,b) = qa Ax>y 


(a) (b) 


Fig. 1. Alternating data automaton example 


Example. Figure l(a) depicts an ADA with input alphabet X = {a, b}, variables 
x = {x,y}, states Q = {q0,q1, 4,93, q4}, initial configuration qo, final states 
F = {q3,q4} and transitions given in Fig.1(b), where missing rules, such as 
A(qo, b), are assumed to be L. Rules A(qg, a) and A(q, a) are universal and there 
are no existential nondeterministic rules. Rules A(qi,a) and A(q2,a) compare 
past (%, Y) with present (x, y) values, A(qo, a) constrains the present and A(q1, b), 
A(q2,b) the past values, respectively. 

Formally, let x, = {£p | x € x}, for any k > 0, be a set of time-stamped 
variables. For an input event a € X and a formula ¢, we write A(¢, a) (respec- 
tively A*(¢, a)) for the formula obtained from ¢ by simultaneously replacing each 
state q € FV®°(¢) by the formula A(q, a) (respectively A(q, a) [xp/X, xXk+1/X], 
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for k > 0). Given a word w = (a1, v1)(a2, V2)... (an, Vn), the run of A over w is 
the sequence of formulae: 


Po(Q) = 41(Q, Xo U X1) >... > Ọn(Q, X0 U... U Xn) 


where ġo =v and, for all k € [1, n], we have ġp = A*(¢p_1, ax). Next, we slightly 
abuse notation and write A(v,a1,...,@,) for the formula ¢,(xo,...,Xn) above. 
We say that A accepts w iff T, v = A(u,a1,...,@n), for some valuation v that 
maps:(1) each x € x, to v(x), for all k € [1,n], (2) each q € FVP! ($n) N F to 
T and (3) each q € FV®°"(¢,,) \ F to L. The language of A is the set L(A) of 
words from Xfx]* accepted by A. 


Example. The following sequence is a non-accepting run of the ADA from Fig. 1 
on the word (a, (0,0)), (a, (1, 1)), (b, (2,1)), where Data’ = Z and the function 
symbols have standard arithmetic interpretation: 


(a,(0,0)) (a,(1,1)) (b,(2,1)) 
go = gidgAx zay z0 = gqiAgAx.=ytlAy axal A gAn >a Ay >y AM ZFOAW 20 => 
an E 


41 92 
q3 AX > y2Aga4Nx2 > yr Ax. FY +1LA yo =X +1 A G4 AX2 > y2 Ax. > x1 Ay. > yi Ax FOAy, =O 
— aa ee ———— 


41 92 22 
In this paper we tackle the following problems: 


1. Boolean closure: given automata A, and Ay, both with the same set of vari- 
ables x, do there exist automata Ay, An and A; such that L(Au) = ArU Ao, 
L(An) = Aı Ag and L(A) = =|x]* \ LUA)? 

2. emptiness: given an automaton A, is L(A) =) ? 


It is well known that other problems, such as universality (given automaton 
A with variables x, does L(A) = X[|x]*?) and inclusion (given automata A, and 
Al with the same set of variables, does L(A,) C L(A2)?) can be reduced to the 
above problems. Observe furthermore that we do not consider cases in which the 
sets of variables in the two automata differ. An interesting problem in this case 
would be: given automata A, and Ag, with variables x; and x2, respectively, such 
that xı C x9, does L(A) C L(Az)\x,, where L(Az)| x, is the projection of the set 
of words L(A2) onto the variables xı? This problem is considered as future work. 


3.1 Boolean Closure 


Given a set Q of Boolean variables and a set x of variables of sort Data, for a 
formula ¢ € Form” (Q, xX), with no negated occurrences of the Boolean variables, 
we define the formula ¢ € Formt (Q, x) recursively on the structure of ¢: 


dı A $2 dı A2 = d1 V b2 
Ad if ọ not atom o=o¢if d6€Q 
o=-7¢ if ¢ g Q atom 


We have |¢| = |¢|, for every formula ¢ € Form? (Q, x). 
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In the following let A; = (x, Qi, ti, Fi, Ai), for i = 1,2, where w.l.o.g. we 
assume that Qı N Q2 = Ø. We define: 


Au = (x, Q1 U Q2, t1 V t2, F1 U Fe, A1 U Ae) 
An = (X, Q1 U Q2, t1 A te, Fi U Fo, A1 U 42) 
Ay = (x, Q1, T1, Q1 \ Fi, 41) 


where Aj (q,a) = A1 (q,a), for all q € Qı and a € X. The following lemma shows 
the correctness of the above definitions: 


Lemma 1. Given automata A; = (x, Qi, ti, Fi, Ai), for i = 1,2, such that Qin 
Q2 = 0, we have L(Ay) = L(A) U L(A), L(An) = L(A) N L(A2) and 
L(A) = Xjx]* \ L(A). 


It is easy to see that |AU| = |An| = Ail + |A| and |A| = |A|, thus 
the automata for the Boolean operations, including complementation, can be 
built in linear time. This matches the linear-time bounds for intersection and 
complementation of alternating automata over finite alphabets [3]. 


4 Antichains and Interpolants for Emptiness 


The emptiness problem for ADA is undecidable, even in very simple cases. For 
instance, if Data’ is the set of positive integers, an ADA can simulate an Alter- 
nating Vector Addition System with States (AVASS) using only atoms x > k 
and « = +k, for k € Z, with the classical interpretation of the function symbols 
on integers. Since reachability of a control state is undecidable for AVASS [15], 
ADA emptiness is undecidable. 

Consequently, we give up on the guarantee for termination and build semi- 
algorithms that meet the requirements below: 


(i) given an automaton A, if L(A) 4 0, the procedure will terminate and return 
a word w € L(A), and 
(ii) if the procedure terminates without returning such a word, then L(A) = 0. 


Let us fix an automaton A = (x,Q,v,F,A) whose (finite) input event 
alphabet is X, for the rest of this section. Given a formula ¢ € Form*(Q,x) 
and an input event a € X, we define the post-image function Posta(¢,a) = 
3x.A(@[K/x], a) € Formt (Q, x), mapping each formula in Form? (Q, x) to a for- 
mula defining the effect of reading the event a. We generalize the post-image 
function to finite sequences of input events, as follows: 


Posta(¢,¢)=@ Posta(d, ua) = Postg(Posta(¢, u), a) 
Acca(u) = Posta(e,u) A Ageq\r(a > +), for any u € X* 


Then the emptiness problem for A becomes: does there exist u € X* such that 
the formula Acca(u) is satisfiable? Observe that, since we ask a satisfiability 
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query, the final states of A need not be constrained?. A naïve semi-algorithm 
enumerates all finite sequences and checks the satisfiability of Acca(u) for each 
u € X*, using a decision procedure for the theory T(S, TZ). 

Since no Boolean variable from Q occurs under negation in ¢, it is easy 
to prove the following monotonicity property: given two formulae ¢,w € 
Form*(Q,x) if d H w then Postal, u) = Posta(w,u), for any u € X*. This 
suggest an improvement of the above semi-algorithm, that enumerates and 
stores only a set U C X* for which {Postg(¢, u) | u € U} forms an antichain® 
w.r.t. the entailment partial order. This is because, for any u,v € &*, if 
Posta(v,u) = Posta(t,v) and Acca(uw) is satisfiable for some w € X*, then 
Posta(v, uw) = Posta(e, vw), thus Acca(vw) is satisfiable as well, and there is 
no need for u, since the non-emptiness of A can be proved using v alone. How- 
ever, even with this optimization, the enumeration of sequences from X* diverges 
in many real cases, because infinite antichains exist in many interpretations, e.g. 
q\az0, q\x#1,... for Data’ =N. 

A safety invariant for A is a function |: (Q > B) > gx7Data’ such that, for 
every Boolean valuation 8 : Q — B, every valuation v : x => Data’ of the data 
variables and every finite sequence u € X* of input events, the following hold: 


1. Z, BU v E Posta(u,u) > v € I(8), and 
2. v € I(8) = 7,6 U v A Acca(u). 


If | satisfies only the first point above, we call it an invariant. Intuitively, a 
safety invariant maps every Boolean valuation into a set of data valuations, 
that contains the initial configuration 1 = Posta(t,e), whose data variables are 
unconstrained, over-approximates the set of reachable valuations (point 1) and 
excludes the valuations satisfying the acceptance condition (point 2). A formula 
(Q, x) is said to define | iff for all 8 : Q — B and v : x — Data’, we have 
I BUv E iff veg). 


Lemma 2. For any automaton A, we have L(A) = @ if and only if A has a 
safety invariant. 


Turning back to the issue of divergence of language emptiness semi- 
algorithms in the case L(A) = 9, we can observe that an enumeration of input 
sequences u1, U2,... E€ X* can stop at step k as soon as Vos Posta(v, uj) defines 
a safety invariant for A. Although this condition can be effectively checked using 
a decision procedure for the theory T(S, J), there is no guarantee that this check 
will ever succeed. 

The solution we adopt in the sequel is abstraction to ensure the termination 
of invariant computations. However, it is worth pointing out from the start that 
abstraction alone will only allow us to build invariants that are not necessarily 


? Since each state occurs positively in Acca(u), this formula has a model iff it has a 
model with every q € F set to true. 

3 Given a partial order (D, <) an antichain is a set A C D such that a £ b for any 
a,b E€ A. 
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safety invariants. To meet the latter condition, we resort to counterexample 
guided abstraction refinement (CEGAR). 

Formally, we fix a set of formulae  C Form(Q,x), such that L € M and 
refer to these formulae as predicates. Given a formula ¢, we denote by ¢* = 
A{x€|¢ H rT} the abstraction of ¢ w.r.t. the predicates in MN. The abstract 
versions of the post-image and acceptance condition are defined as follows: 


Post’, (¢, £) =¢ Postë (4, ua) = (Post.q(Post#,(¢, way 


Acch (u) = Postia(s,u) A Agegyr(q > L), for any u € E* 


Lemma 3. For any bijection p : N — 2X”, there exists k > 0 such that 
We 5 Post (+, u(i)) defines an invariant Ë for A. 


We are left with fulfilling point (2) from the definition of a safety invariant. To 
this end, suppose that, for a given set M of predicates, the invariant lË, defined 
by the previous lemma, meets point (1) but not point (2), where Posty and 
Acca replace Post’, and Accé,, respectively. In other words, there exists a finite 
sequence u € X* such that v € lIË(8) and 7,8 Uv H Acct, (u), for some Boolean 
6: Q > B and data v : x > Data’ valuations. Such a u € X* is called a 
counterexample. 

Once a counterexample u is discovered, there are two possibilities. Either 
(i) Acca(u) is satisfiable, in which case u is feasible and L(A) # 0, or (ii) Accalu) 
is unsatisfiable, in which case u is spurious. In the first case, our semi-algorithm 
stops and returns a witness for non-emptiness, obtained from the satisfying val- 
uation of Acca(u) and in the second case, we must strenghten the invariant by 
excluding from lË all pairs (8, v) such that 7, 8U v H Acch (u). This strengthen- 
ing is carried out by adding to I several predicates that are sufficient to exclude 
the spurious counterexample. 

Given an unsatisfiable conjunction of formulae Yı A... A Yn, an interpolant 
is a tuple of formulae (1,...,In—1, In) such that I, = L, I; Aw; =r Tigi and 
I; contains only variables and function symbols that are common to y; and 
wisi, for all i € [n — 1]. Moreover, by Lyndon’s Interpolation Theorem [16], 
we can assume without loss of generality that every Boolean variable with at 
least one positive (negative) occurrence in J; has at least one positive (negative) 
occurrence in both y; and Y;+1. In the following, we shall assume the existence 
of an interpolating decision procedure for T(S,Z) that meets the requirements 
of Lyndon’s Interpolation Theorem. 

A classical method for abstraction refinement is to add the elements of the 
interpolant obtained from a proof of spuriousness to the set of predicates. This 
guarantees progress, meaning that the particular spurious counterexample, from 
which the interpolant was generated, will never be revisited in the future. Though 
not always, in many practical test cases this progress property eventually yields 
a safety invariant. 
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Given a non-empty spurious counterexample u = a1... an, Where n > 0, we 
consider the following interpolation problem: 


O(u) = 0o(Qo) A 81(Qo U Q1, X0 U X1) A... (1) 
A On(Qn—1 U Qn, Xn-1 U Xn) A On+1(Qn) 


where Qk = {qk |q E Q}, k € [0,n] are time-stamped sets of Boolean variables 
corresponding to the set Q of states of A. The first conjunct 99(Qo) = +[Qo/Q] 
is the initial configuration of A, with every q € FVP®?!(1) replaced by qo. The 
definition of 0x, for all k € [1,n], uses replacement sets Re C Qe, £ € [0, n], which 
are defined inductively below: 


— Ro = FV"""(6), 

-0e = Naver, (1 `> Ala ae)[Qe/Q,xe-1/X,Xe/x]) and Re = 
FV®°(6¢) N Qe, for each £ € [1, n]. 

— On+1(Qn) = Ngeqy Fr (In =L); 

The intuition is that Ro,..., Rn are the sets of states replaced, 0o, ..., 0n are the 

sets of transition rules fired on the run of A over u and 6,41 is the acceptance 

condition, which forces the last remaining non-final states to be false. We recall 

that a run of A over u is a sequence: 


polQ) > 41(Q, X0 U X1) >... > bn(Q,X0 U... U Xn) 


where ġo is the initial configuration +: and for each k > 0, ģp is obtained from ġķk—1 
by replacing each state q € FV”®?!(ġp—1) by the formula A(q, ap)[xk-1/X, Xx /x], 
given by the transition function of A. Observe that, because the states are 
replaced with transition formulae when moving one step in a run, these formulae 
lose track of the control history and are not suitable for producing interpolants 
that relate states and data. 

The main idea behind the above definition of the interpolation problem is 
that we would like to obtain an interpolant (T, Jo(Q), (Q,x),..-,In(Q,x), L) 
whose formulae combine states with the data constraints that must hold locally, 
whenever the control reaches a certain Boolean configuration. This association of 
states with data valuations is tantamount to defining efficient semi-algorithms, 
based on lazy abstraction [8]. Furthermore, the abstraction defined by the inter- 
polants generated in this way can also over-approzimate the control structure of 
an automaton, in addition to the sets of data values encountered throughout its 
runs. 

The correctness of this interpolation-based abstraction refinement setup is 
captured by the progress property below, which guarantees that adding the 
formulae of an interpolant for O(w) to the set M of predicates suffices to exclude 
the spurious counterexample u from future searches. 


Lemma 4. For any sequence u = a...an E X*, if Acca(u) is unsatisfiable, 
the following hold: 


1. O(u) is unsatisfiable, and 
2. if (T, Io,..-, In, L) is an interpolant for O(u) such that {I; | i € [0,nj} CN 
then Acc% (u) is unsatisfiable. 
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5 Lazy Predicate Abstraction for ADA Emptiness 


We have now all the ingredients to describe the first emptiness checking semi- 
algorithm for alternating data automata. Algorithm‘ 1 builds an abstract reach- 
ability tree (ART) whose nodes are labeled with formulae over-approximating 
the concrete sets of configurations, and a covering relation between nodes in 
order to ensure that the set of formulae labeling the nodes in the ART forms 
an antichain. Any spurious counterexample is eliminated by computing an inter- 
polant and adding its formulae to the set of predicates (cf. Lemma 4). Formally, 
an ART is tuple T = (N, E, r, A, R,T, <), where: 


— N is a set of nodes, 

—~ ECNx Sx N is a set of edges, 

— r € N is the root of the directed tree (N, E), 

—~ A: N — Form(Q,x) is a labeling of the nodes with formulae, such that 
A(r) =1, 

- R: N — 2° is a labeling of nodes with replacement sets, such that R(r) = 
Bye. 

-T:Ea Us Form? (Qi, Xi, Qi41,Xi41) is a labeling of edges with time- 
stamped formulae, and 

—- < CTN xN isa set of covering edges. 


Each node n € N corresponds to a unique path from the root to n, labeled 
by a sequence A(n) € X* of input events. The least infeasible suffix of A(n) is 
the smallest sequence v = a,...a,, such that A(n) = wv, for some w € &* and 
the following formula is unsatisfiable: 


W(v) = A(p)[Q0/Q] A A1(Qo U Q1, X0 UX1) A... A Ok+1(Qk) (2) 


where 61,...,0%41 are defined as in (1) and 09 = A(p)[Qo/Q]. The 
pivot of n is the node p corresponding to the start of the least infeasi- 
ble suffix. We assume the existence of two functions FINDPIVOT(u,7) and 
LEASTINFEASIBLESUFFIX(u,7 ) that return the pivot and least infeasible suf- 
fix of a sequence u € X* in an ART 7, without detailing their implementation. 

With these considerations, Algorithm 1 uses a worklist iteration to build an 
ART. We keep newly expanded nodes of J in a queue WorkList, thus imple- 
menting a breadth-first exploration strategy, which guarantees that the shortest 
counterexamples are explored first. When the search encounters a counterexam- 
ple candidate u, it is checked for spuriousness. If the counterexample is feasible, 
the procedure returns a data word w € L(A), which interleaves the input events 
of u with the data valuations from the model of Acca(u) (since u is feasible, 
clearly Acca(u) is satisfiable). Otherwise, u is spurious and we compute its pivot 
p (line 12), add the interpolants for the least unfeasible suffix of u to N, remove 
and recompute the subtree of T rooted at p. 

Termination of Algorithm 1 depends on the ability of a given interpolating 
decision procedure for the combined Boolean and data theory T(S, 7) to provide 


* Though termination is not guaranteed, we call it algorithm for conciseness. 
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Algorithm 1. Lazy Predicate Abstraction for ADA Emptiness 


input: an ADA A = (x, Q, ., F, A) over the alphabet X of input events 
output: true if L(A) = Ø and a data word w € L(A) otherwise 

1: let T = (N, E, r, A, <) be an ART 

2: initially N = E = < = f, A = {(r,1)}, N = {L}, WorkList = (r), 

3: while WorkList 4 Ø do 


4 dequeue n from WorkList 

5 N- NU {n} 

6 let A(n) = a1... ak be the label of the path from r to n 

T: if Post (A(n)) is satisfiable then > counterexample candidate 
8 if Acca(u) is satisfiable then > feasible counterexample 
9: get model (8, v1, ..., Vg) of Acca(A(n)) 

10: return w = (a1,11)... (ax, Vk) > w € L(A) by construction 
11: else > spurious counterexample 
12: p — FINDPrvor(A(n),7) 

13: v <— LEASTINFEASIBLESUFFIX(A(n), T) 

14: n Mu {1o,..., Ie}, where (T, To,..., Ie, L) is an interpolant for ¥ (v) 

15: let S = (N’, E’,p, A’, <’) be the subtree of T rooted at p 

16: for (m, q) € < such that q € N’ do 

17: remove m from N and enqueue m into WorkList 

18: remove S from T 

19: enqueue p into WorkList >œ recompute the subtree rooted at p 
20: else 

21: for a € X do > expand n 
22: oe Post#,(A(n), a) 

23: if exist m € N such that ¢ = A(m) then 

24: < dU {(n,m)} > m covers n 
25: else 

26: let s be a fresh node 

27: E— EU {(n,a,s)} 

28: A AU {(s, ¢)} 

29: R {m € WorkList | A(m) = ¢} > worklist nodes covered by s 
30: for r € R do 

31: for m € N such that (m,b,r) € E, b € X do 

32: < <U {(m, s)} > redirect covered children from R into s 
33: for (m,r) € < do 

34: < <U {(m, s)} > redirect covered nodes from R into s 
35: remove R from T 

36: enqueue s into WorkList 


37: return true 


interpolants that yield a safety invariant, whenever L(A) = Ø. In this case, we 
use the covering relation < to ensure that, when a newly generated node is 
covered by a node already in N, it is not added to the worklist, thus cutting the 
current branch of the search. 

Formally, for any two nodes n,m € N, we have n < m iff Post’, (A(n), a) = 
A(m) for some a € X, in other words, if n has a successor whose label entails 
the label of m. 


Example. Consider the automaton given in Fig. 1. First, Algorithm 1 fires the 
sequence a, and since there are no other formulae than L in M, the successor 
of 1 = qo is T, in Fig. 2(a). The spuriousness check for a yields the root of the 
ART as pivot and the interpolant (qo, qi), which is added to the set M. Then 
the T node is removed and the next time a is fired, it creates a node labeled 
qi. The second sequence aa creates a successor node q1, which is covered by the 
first, depicted with a dashed arrow, in Fig. 2(b). The third sequence is ab, which 
results in a new uncovered node T and triggers a spuriousness check. The new 
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predicate obtained from this check is x < 0A q2 Ay => 0 and the pivot is again 

the root. Then the entire ART is rebuilt with the new predicates and the fourth 

sequence aab yields an uncovered node T, in Fig.2(c). The new pivot is the 

endpoint of a and the newly added predicates are qı ^ q2 and y > x—1/ q@. 

Finally, the ART is rebuilt from the pivot node and finally all nodes are covered, 

thus proving the emptiness of the automaton, in Fig. 2(d). 
The correctness of Algorithm 1 is proved below: 


T={L,qoq:} E A 


v _ 
a a ` 
w ——> 4 > a 
= pivot 
N=(1} b 
a 
i ——> F E4 
pivot J add predicates add predicates 
{qvq} {[x<0AqAy2>0} 
(a) (b) 


M={1,40.4.x<0Aq,Ay20} 


-7 a T={.L,4o,4,xS0Aq,Ay20,q,Aq,,y>x-1Aq2} 
a Ceiro ) a Voa a 


a b 
% > arhxsdAgAye0 ——> o os q ——> aAx<0Aq,Ay20 ——> 1 


po p -| 


a 
1 Be L1 < a,AapAy>x-1 ——> a,Aq,Ay>x-1 
add predicates AL oe 


{a,Aqz,y>x-1Aqz} wee ne eee - 
© @ 


Fig. 2. Proving emptiness of the automaton from Fig. 1 by Algorithm 1 


Theorem 1. Given an automaton A, such that L(A) #0, Algorithm 1 termi- 
nates and returns a word w € L(A). If Algorithm 1 terminates reporting true, 
then L(A) = 0. 


6 Checking ADA Emptiness with IMPACT 


As pointed out by a number of authors, the bottleneck of predicate abstraction is 
the high cost of reconstructing parts of the ART, subsequent to the refinement 
of the set of predicates. The main idea of the IMPACT procedure [17] is that 
this can be avoided and the refinement (strengthening of the node labels of the 
ART) can be performed in-place. This refinement step requires an update of the 
covering relation, because a node that used to cover another node might not 
cover it after the strengthening of its label. 

We consider a total alphabetical order < on X and lift it to the total lexi- 
cographical order <* on X*. A node n € N is covered if (n,p) € < or it has an 
ancestor m such that (m, p) € <, for some p € N. A node n is closed if it is cov- 
ered, or A(n) KF A(m) for all m € N such that A(m) <* A(n). Observe that we 
use the coverage relation < here with a different meaning than in Algorithm 1. 

The execution of Algorithm 2 consists of three phases®: close, refine and 
expand. Let n be a node removed from the worklist at line 4. If Acca(A(n)) 


5 Corresponding to the CLOSE, REFINE and EXPAND in [17]. 
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Algorithm 2. IMPACT for ADA Emptiness 
input: an ADA A = (x, Q, ı, F, A) over the alphabet X of input events 
output: true if L(A) = Ø and a data word w € L(A) otherwise 
1: let T = (N, E,r, A, R,T, <) be an ART 
2: initially N = E =T = < = Í, A = {(r,1)}, R = FVE! (1[Q0/Q]), WorkList = {r} 
3: while WorkList 4 Ø do 


4: dequeue n from WorkList 

5 N- NU {n} 

6: let (r, a1, n1), (n1, a2, n2), -. ., (Nk—1, ak, Nn) be the path from r to n 

T: if Acca(aı ... ap) is satisfiable then > counterexample is feasible 
8: get model (8, v1, ..., Vk) of Acca(à(n)) 

9: return w = (a1, v1)... (ak, Vk) > w € L(A) by construction 
10: else > spurious counterexample 
11: let (T,Io,...,Ix, L) be an interpolant for O(a1...ax) 

12: b — false 

13: for i=0,...,k do 

14: if A(n;) jÆ I; then 

15: da d\{(m,ni)Ea|meNn} 

16: A(ni) — A(ni) AT > strenghten the label of n; 
17: if =b then 

18: b — CLOsE(n;) 

19: if n is not covered then 

20: for a € X do > expand n 
21: let s be a fresh node and e = (n,a, s) be a new edge 

22; E — Eu fe} 

23: A AU {(s, T)} 

24: T — TU {(e, 0,)} 

25: R— RU {(s, Uge rin) FV"(AG, a)))} 

26: enqueue s into WorkList 


27: return true 

1: function CLosE(x) returns Bool 

2 for y € N such that A(y) <* A(x) do 

3 if A(x) = A(y) then 

4: < (<d\ {(p,¢) € < | q is z or a successor of x}) U {(a, y)} 
5: return true 

6 return false 


is satisfiable, the counterexample A(n) is feasible, in which case a model of 
Acca(A(n)) is obtained and a word w € L(A) is returned. Otherwise, A(n) is a 
spurious counterexample and the procedure enters the refinement phase (lines 
11-18). The interpolant for O(A(n)) (cf. formula 1) is used to strenghten the 
labels of all the ancestors of n, by conjoining the formulae of the interpolant to 
the existing labels. 

In this process, the nodes on the path between r and n, including n, might 
become eligible for coverage, therefore we attempt to close each ancestor of n 
that is impacted by the refinement (line 18). Observe that, in this case the call 
to CLOSE must uncover each node which is covered by a successor of n (line 4 
of the CLOSE function). This is required because, due to the over-approximation 
of the sets of reachable configurations, the covering relation is not transitive, as 
explained in [17]. If CLOSE adds a covering edge (n;, m) to <, it does not have to 
be called for the successors of n; on this path, which is handled via the Boolean 
flag b. 

Finally, if n is still uncovered (it has not been previously covered during the 
refinement phase) we expand n (lines 20-26) by creating a new node for each 
successor s via the input event a € X and inserting it into the worklist. 
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Fig. 3. Proving emptiness of the automaton from Fig. 1 by Algorithm 2 


Example. We show the execution of Algorithm 2 on the automaton from Fig. 1. 
Initially, the procedure fires the sequence a, whose endpoint is labeled with 
T, in Fig. 3(a). Since this node is uncovered, we check the spuriousness of the 
counterexample a and refine the label of the node to q1. Since the node is still 
uncovered, two successors, labeled with T are computed, corresponding to the 
sequences aa and ab, in Fig. 3(b). The spuriousness check for aa yields the inter- 
polant (qo, £ <OAq2 Ay È 0) which strengthens the label of the endpoint of a 
from qı to qı Az <OAqeAy = 0. The sequence ab is also found to be spurious, 
which changes the label of its endpoint from T to L, and also covers it (depicted 
with a dashed edge). Since the endpoint of aa is not covered, it is expanded to 
aaa and aab, in Fig. 3(c). Both sequences aaa and aab are found to be spurious, 
and the enpoint of aab, whose label has changed from T to L, is now covered. In 
the process, the label of aa has also changed from qı to qı Ay > x — 1A qa, due 
to the sstrengthening with the interpolant from aab. Finally, the only uncov- 
ered node aaa is expanded to aaaa and aaab, both found to be spurious, in 
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Fig. 3(d). The refinement of aaab causes the label of aaa to change from qı to 

qı Ay >x— 1 ^q and this node is now covered by aa. Since its successors are 

also covered, there are no uncovered nodes and the procedure returns true. 
The correctness of Algorithm 2 is coined by the theorem below: 


Theorem 2. Given an automaton A, such that L(A) 4 0, Algorithm 2 termi- 
nates and returns a word w E L(A). If Algorithm 2 terminates reporting true, 
then L(A) = 0. 


7 Experimental Evaluation 


We have implemented both Algorithms 1 and 2 in a prototype tool? that uses the 
MathSAT5 SMT solver’ via the Java SMT interface? for the satisfiability queries 
and interpolant generation, in the theory of linear integer arithmetic with unin- 
terpreted Boolean functions (UFLIA). We compared both algorithms with a pre- 
vious implementation of a trace inclusion procedure, called INCLUDER®, that uses 
on-the-fly determinisation and lazy predicate abstraction with interpolant-based 
refinement [11] in the LIA theory. The datasets generated during and/or anal- 
ysed during the current study are available in the figshare repository: https:// 
doi.org/10.6084/m9.figshare.5925472.v1 [12]. 


Table 1. 
Example A| L(A) =@ ? | Algorithm 1 | Algorithm 2 | INCLUDER 
(bytes) (sec) (sec) (sec) 
simplel 309 | No 0.774 0.064 0.076 
simple2 504 | Yes 0.867 0.070 0.070 
simple3 214 | Yes 0.899 0.095 0.095 
array-shift 874 | Yes 2.889 0.126 0.078 
array_simple 3440 | Yes Timeout 9.998 7.154 
array_rotationl! | 1834 | Yes 7.227 0.331 0.229 
array_rotation2 | 15182 | Yes Timeout Timeout 31.632 
abp 6909 | No 9.492 0.631 2.288 
train 1823 | Yes 19.237 0.763 0.678 
hw1 322 | Yes 1.861 0.163 0.172 
hw2 674 | Yes 24.111 0.308 0.473 


The results of the experiments are given in Table 1. We applied the tool first to 
several array logic entailments, which occur as verification conditions for imper- 
ative programs with arrays [2] (array_shift, array_simple, array_rotation1+2) 


ê The implementation is available at https: //github.com/cathiec/JAltImpact. 
T http: //mathsat.fbk.eu/. 

8 https: //github.com/sosy-lab/java-smt. 

° http: //www.fit.vutbr.cz/research/groups/verifit /tools/includer/. 
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available online [19]. Next, we applied it on proving safety properties of hard- 
ware circuits (hwl+2) [22]. Finally, we considered two timed communication 
protocols, consisting of systems that are asynchronous compositions of timed 
automata, whom correctness specifications are given by timed automata moni- 
tors: a timed version of the Alternating Bit Protocol (abp) [25] and a controller of 
a railroad crossing (train) [9]. All results were obtained on x86_64 Linux Ubuntu 
virtual machine with 8GB of RAM running on an Intel(R) Xeon(R) CPU E5- 
2683 v3 @ 2.00 GHz. The automata sizes are given in bytes needed to store their 
ASCII description on file and the execution times are in seconds. 

As in the case of non-alternating nondeterministic integer programs [17], 
the alternating version of IMPACT (Algorithm 2) outperforms lazy predicate 
abstraction for checking emptiness by at least one order of magnitude. Moreover, 
IMPACT is comparable, on average, to the previous implementation of INCLUDER, 
which uses also MathSAT5 via the C API. We believe the reason for which 
INCLUDER outperforms IMPACT on some examples is the hardness of the UFLIA 
entailment checks used in Algorithm 2 (lines 14 and 3 in the function CLOSE) as 
opposed to the pure LIA entailment checks used in INCLUDER. According to our 
statistics, Algorithm 2 spends more than 50% of the time waiting for the SMT 
solver to finish answering entailment queries. 
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Abstract. Formal methods applications often rely on SMT solvers to 
automatically discharge proof obligations. SMT solvers handle quanti- 
fied formulas using incomplete heuristic techniques like E-matching, and 
often resort to model-based quantifier instantiation (MBQI) when these 
techniques fail. This paper revisits enumerative instantiation, a tech- 
nique that considers instantiations based on exhaustive enumeration of 
ground terms. Although simple, we argue that enumerative instantiation 
can supplement other instantiation techniques and be a viable alterna- 
tive to MBQI for valid proof obligations. We first present a stronger 
Herbrand Theorem, better suited as a basis for the instantiation loop 
used in SMT solvers; it furthermore requires considering less instances 
than classical Herbrand instantiation. Based on this result, we present 
different strategies for combining enumerative instantiation with other 
instantiation techniques in an effective way. The experimental evaluation 
shows that the implementation of these new techniques in the SMT solver 
CVC4 leads to significant improvements in several benchmark libraries, 
including many stemming from verification efforts. 


1 Introduction 


In many formal methods applications, such as verification, it is common to rep- 
resent proof obligations in terms of the Satisfiability Modulo Theories (SMT) 
problem. SMT solvers have thus become popular backends for such applications. 
They have been primarily designed to be decision procedures for quantifier-free 
problems, on which they are highly efficient and capable of handling large for- 
mulas over background theories. Quantified formulas are generally handled with 
instantiation techniques that are often incomplete, even on decidable or semi- 
decidable fragments. Heavily relying on incomplete heuristics however leads to 
instability and unpredictability on the solver’s behavior, which is undesirable for 
the tools relying on them. To address these issues some systems use model-based 


© The Author(s) 2018 
D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 112-131, 2018. 
https://doi.org/10.1007/978-3-319-89963-3_7 


Revisiting Enumerative Instantiation 113 


instantiation (MBQI) [19], a complete technique for first-order logic with equal- 
ity and for several restricted fragments containing theories, which can be used 
as a fallback strategy to the incomplete techniques. 

In this paper we introduce a novel enumerative instantiation technique which 
can serve as a simpler alternative to model-based instantiation. Similar to MBQI, 
our technique can be used as a secondary strategy when incomplete techniques 
fail. Our experiments show that a careful implementation of this technique in 
the state-of-the-art SMT solver CVC4 leads to noticeable gains in performance 
on unsatisfiable problems. 


Background. Some of the earliest tools for theorem proving in first-order logic 
come from the work by Skolem and Herbrand. The Herbrand Theorem states 
that if a closed formula in Skolem normal form, i.e. a prenex formula without 
existential quantifiers, is unsatisfiable, then there is an unsatisfiable finite con- 
junction of Herbrand instances of the formula, that is, instances on terms from 
the Herbrand universe, i.e. the set of all possible well-sorted ground terms in the 
formula’s signature. The first theorem provers for first-order logic to be imple- 
mented based on Herbrand’s theorem employed a completely unguided search 
on the Herbrand Universe (e.g. Gilmore [20] and Davis et al. [11] early efforts). 
Such systems were only capable of dealing with very simple formulas and were 
soon put aside. Techniques which would only generate Herbrand instances when 
needed were first introduced by Prawitz [24] and later refined by Davis and 
Putnam [12], culminating in the resolution calculus introduced by Robinson [30]. 
The most successful techniques for handling pure first-order logic have been 
based on resolution and ordering criteria [3]. More recently, techniques based on 
instantiation have shown promise for first-order logic as well [13,17,28]. Inspired 
by early work on the subject, this paper revisits whether modern implementa- 
tions of the latter class of techniques can benefit from enumerative instantiation. 


Outline. We first give preliminaries in Sect.2. Then, we introduce a stronger 
Herbrand Theorem as the basis for making enumerative instantiation practical 
so that it can be used in modern systems in Sect.3. We formalize the differ- 
ent instantiation strategies used by state-of-the-art SMT solvers, discuss their 
strengths and weaknesses, and present a schematization of how to combine such 
strategies in Sect.4, with a focus on a new strategy for enumerative instan- 
tiation. An extensive experimental evaluation of enumerative instantiation as 
implemented in CVC4 is presented in Sect. 5. 
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2 Preliminaries 


We work in the context of many-sorted first-order logic with equality (see 
e.g. [16]) and assume the reader is familiar with the notions of signature, term, 
(quantified and ground) formula, atom, literal, free and bound variable, and 
substitution. 

We consider signatures X containing a Bool sort and constants T, L and a 
family of predicate symbols (~ : r x 7 — Bool) interpreted as equality for each 
sort T. Without loss of generality, we assume ~% is the only predicate in X. We 
use = for syntactic equality. The set of all terms occurring in a formula ọ (resp. 
term t) is denoted by T(y) (resp. T(t)). We write t for the sequence of terms 
t,,..., În for an unspecified n € N* that is either irrelevant or deducible from 
the context. 

An interpretation is a triple W = (2, %, V) in which 2 is a collection of 
non-empty domain sets for all sorts in X, J interprets symbols by mapping 
them into functions over domain sets according to the symbol sort, and VY maps 
free variables to elements of their respective domain sets. A theory is a pair 
ZF = (X, Q) in which X is a signature and 2 is a class of interpretations 
denoted the models of Y. The empty theory is the theory for which the class 
of interpretations (2 is unrestricted, which coincides with first-order logic with 
equality. Throughout this paper we assume a fixed background theory 7, which 
unless otherwise stated is the empty theory. A formula y is satisfiable (resp. 
unsatisfiable) in J if it is satisfied by some (resp. no) interpretation M € 2, 


written M =y vy. A formula ¢ entails in F a formula w, written yp Ez Y, 
if every interpretations in 2 satisfying y also satisfies w. For these notions of 


model satisfaction and entailment in the empty theory, we omit the subscript. 

A substitution ø maps variables to terms and its domain, dom(o), is finite. 
We write ran(c) to denote its range. Throughout the paper, conjunctions may be 
written as sets or tuples, and vice-versa, whenever convenient and unambiguous. 
All definitions are assumed to be lifted in the expected way from formulas into 
sets or tuples of formulas. 


SMT Solver Instantiation 


module hy (ode) 
SMT formula instance 
m) UNSAT 


Ground 
SMT solver 


Fig. 1. The SMT instantiation loop for quantified formulas 
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Instantiation-Based SMT Solvers 


Quantifiers in formulas are generally handled by SMT solvers through 
instantiation-based techniques, which capitalize on their capability to handle 
large ground formulas. In this approach, an input formula w is given to the 
ground SMT solver, which will abstract all atoms and quantified formulas and 
treat them as if they were propositional variables. The solver for ground formu- 
las will provide an assignment EU Q, where E is a set of ground literals and Q 
is a set of quantified formulas appearing in 7, such that E U Q propositionally 
entails ~. We assume that all quantified formulas in ~ are of the form YZ. y 
with y quantifier-free. This can be achieved by prenex form transformation and 
Skolemization. The instantiation module of the solver will then generate new 
ground formulas of the form VZ. p = yo where YZ. y is a quantified formula 
in Q and ø is a substitution from the variables in y to ground terms. These 
instances will be added conjunctively to the input of the ground solver, hence 
refining its knowledge of the quantified formulas. The ground solver may then 
provide another assignment E’ U Q’, where this is a set that entails both y and 
the newly added instances. This new assignment might either be the previous 
one, augmented by new ground literals coming from the new instances, or if the 
previous E has been refuted by the new instances, a completely different set. On 
the other hand, the process may terminate if the newly added instances suffice 
to prove the unsatisfiability of the original formula. We will refer to the game 
between the ground solver that provides assignments for the abstraction of the 
formula and the instantiation module that provides instances added conjunc- 
tively to the formula, as the instantiation loop of the SMT solver (see Fig. 1). 


3 Herbrand Theorem and Beyond 


The Herbrand Theorem (see e.g. [16]) for pure first-order logic with equality? 
provides a refutationally complete procedure to check the satisfiability of a for- 
mula 7, or more specifically of a set of literals and quantifiers E U Q. Indeed, 
EU Q is satisfiable if and only if EU Q, is satisfiable, where Q, is the set of all 
(Herbrand) instances one can build from the quantifiers in Q by instantiation 
with the Herbrand universe, i.e. all the possible well-sorted terms built on the 
signature used in E U Q. Based on this, an instantiation module has a simple 
refutationally complete strategy for pure first-order logic with equality: it suf- 
fices to enumerate Herbrand instances. The major drawback of this strategy is 
that the Herbrand universe is large. For instance, as soon as there is a function 
with the range sort also used as an argument, the Herbrand universe is infinite. 


1 The Herbrand Theorem is generally presented in pure first-order logic without equal- 
ity, but it also holds for equality: it suffices to consider the equality axioms conjunc- 
tively with formulas. 


116 A. Reynolds et al. 


Fortunately, a stronger variant of the Herbrand Theorem holds. Using this 
variant, the instantiation module does not need to consider all possible well- 
sorted terms (i.e. the full Herbrand universe), but only the terms already avail- 
able in EU Q, and those subsequently generated. 


Theorem 1. Consider the conjunctive sets E and Q of ground literals and uni- 
versally quantified clauses respectively where T(E) contains at least one term of 
each sort. The set EU Q is unsatisfiable in pure first-order logic if and only if 
there exists a series Q; of finite sets of instances of Q such that 


- for some number n, the finite set of formulas EU U Q; is unsatisfiable; 
- Qin C {yo | YZ. Y € Q, ran(o) C T(EU Uj- Q;)}- 


Proof. All proofs for this section are included in [26]. 


The above theorem is stronger than the classical Herbrand theorem in the sense 
that the set of instances considered above is smaller (or equal) than the set of 
instances considered in the classical Herbrand theorem. As a trivial example, 
if a function f appears only in EU Q in ground terms, no new applications of 
f are considered. The theorem does not consider all arbitrary terms from the 
signature, but only those that are generated by the successive instantiations with 
only already available ground terms. Note the theorem holds for pure first-order 


logic with equality, and in any theory that preserves the compactness property. It 
is also necessary however to consider the axioms of the theory for the generation 
of new terms, that might lead to other instances. 

In the Bernays-Schénfinkel-Ramsey fragment of first-order logic (also know 
as the EPR class) formulas do not contain non constant function symbols, there- 
fore the Herbrand universe of any formula is a finite set. Since the above sets 
of terms are a subset of the Herbrand universe, the enumeration will always 
terminate, even when the formula is satisfiable. Therefore, the resulting ground 
problem is decidable, and the above method comprises a decision procedure for 
this fragment, just like some variant of model-based quantifier instantiation. 

Theorem 1 implies that an instantiation module only has to consider terms 
occurring within assignments, and not all possible terms. To show refutational 
completeness (termination on unsatisfiable input) and model soundness (termi- 
nation without declaring unsatisfiability implies that the input is satisfiable), it 
is however necessary to account for the successive assignments produced by the 
ground SMT solver and the consecutive generation of instances. This is achieved 
using the following lemma. 


Lemma 1. Consider the conjunctive sets E and Q of ground literals and uni- 
versally quantified clauses respectively where T(E) contains at least one term 
of each sort. If there exists an infinite series of finite satisfiable sets of ground 
literals E; and of finite sets of ground instances Q; of Q such that 
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- Qi = {yo | Vz. p € Q, dom(a) = {z} A ran(o) C T(E,)}; 
- Eg =E, Ennis F E UQ; 


then EUQ is satisfiable in the empty theory with equality. 


The above lemma has two direct consequences on the instantiation loop of 
SMT solvers, where instances are generated from the set of available terms in 
the ground assignment provided by the ground SMT solver. The following two 
corollaries state the model soundness and the refutational completeness of the 
instantiation loop respectively. 


Corollary 1. Given a formula w, if there exists a satisfiable set of literals E and 
a set of quantified clauses Q such that EU Q —& w and the instantiation module 
of the SMT solver cannot generate any new instance, i.e. E already entails all 
instances of Q for substitutions built with terms T(E), then w is satisfiable. 


Proof. A formal statement of the corollary and a proof is available in [26]. 


Corollary 2. Given an unsatisfiable formula, if the generation of instances is 
fair the instantiation loop of the SMT solver terminates. 


Proof. A formal statement of the corollary and a proof is available in [26]. 


c( E, VX. ): 1. Either return {0} where E,po | L, or return 0. 


e( E, VX. ): 1. Select a set of triggers {f,...f,} for VX. 9. 
2. For each i = 1,...,n, select a set of substitutions S; such that 
for each o € S;, E = Go ~ gi for some tuple g; € T(E). 
3. Return U; Si. 


m( E, Vx.y): 1. Construct a model ⁄ for E. 
2. Return {{X > f} } where Fe T(E) and 4 | p{¥ > f}, or 0 if none exists. 
u( E, VX. ): 1. Choose an ordering < on tuples of quantifier-free terms. 


N 


. Return {{¥ > f} } where fis a minimal tuple of terms w.r.t < such that 
te T(E) and E j4 y{x+ f}, or 0 if none exist. 


Fig. 2. Quantifier Instantiation strategies: Conflict-based Instantiation (c), E-matching 
instantiation (e), Model-based Instantiation (m) and Enumerative Instantiation (u). 


4 Quantifier Instantiation in CDCL(.7) 


This section overviews recent techniques used by SMT solvers for quantifier 
instantiation, and comments on their relative strengths and weaknesses. We will 
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focus on enumerative quantifier instantiation, a technique which has received lit- 
tle attention in recent work, but has several compelling advantages with respect 
to current techniques. 


Definition 1 (Instantiation Strategy). An instantiation strategy takes as 
input: 


1. A F -satisfiable set of ground literals E, and 
2. A quantified formula Vz. ọ. 


It outputs a set of substitutions {o1,...,0n} where dom(o;) = & for each i = 
1 


iowa: 


Figure 2 gives four instantiation strategies used by modern SMT solvers, each 
that have the interface given in Definition 1. The first three have been described 
in detail in previous works (see [25] for a recent overview). We briefly review these 
techniques in this section. The fourth, enumerative quantifier instantiation, is the 
subject of this paper. 

Conflict-based instantiation (c) was introduced in [28] as a technique for 
improving the performance of SMT solvers for unsatisfiable problems. In this 
strategy, we return a substitution o such that yo together with E is unsatisfiable, 
We refer to yo as a conflicting instance (for E). Typical implementations of this 
strategy do not insist that a conflicting instance be returned if one exists, and 
hence the strategy may choose to return the empty set of substitutions. Recent 
work [4,5] gives a strategy for conflict-based instantiation that has refutational 
completeness guarantees for the empty theory with equality, that is, when a 
conflict instance exists for a quantified formula in this theory, the strategy is 
guaranteed to return it. 

E-matching instantiation (e) is the most commonly used strategy for quan- 
tifier instantiation in modern SMT solvers [13,15,18]. In this strategy, we first 
heuristically choose a set of triggers for a quantified formula VZ. p, where a trig- 
ger is a tuple of terms whose free variables are z. In practice, triggers can be 
selected using user-provided annotations, or selected automatically by the SMT 
solver. For each trigger t;, we select a set of substitutions S; such that for each o 
in this set, E entails that t;o is equal to a tuple of ground terms g; in E. We return 
the union of these sets S; for each selected trigger. E-matching instantiation is 
generally incomplete, but works well in practice for unsatisfiable problems, and 
hence is a key component of most SMT solvers that support quantified formulas. 

Model-based quantifier instantiation (m) was introduced in [19], and has 
also been used for improving the performance of finite model finding [29]. In this 
strategy, we first construct a model .@ for the quantifier-free portion of our input 
E, where typically the interpretations of functions for values not constrained by E 
are chosen heuristically. Notice that @ does not necessarily satisfy the quantified 
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formula Vz. y. If it does not, we return a single substitution o for which .W@ does 
not satisfy yo, where typically o maps variables from Z to terms that occur in 
T(E). With respect to conflict-based and E-matching instantiation, model-based 
quantifier instantiation has the advantage that it is model sound: when it returns 
Ø, then EU {VZ. p} is satisfiable. 

This paper revisits enumerative quantifier instantiation (u) as a viable alter- 
native to model-based quantifier instantiation. In this strategy, we assume an 
ordering < on quantifier-free terms. This ordering is not related to the usual 
term ordering one generally uses for saturation theorem proving, but rather 
determines which instance will be generated first. The strategy returns the sub- 
stitution {Z +> t}, where ¢ is the minimal tuple of terms with respect to < 
from T(E) such that {g + t} is not entailed by E. We refer to this strategy 
as enumerative instantiation since in the worst case it generates instantiations 
by enumerating tuples of all terms of the proper sort from E, according to the 
ordering <. In practice, the number of instantiations produced by this strategy 
is kept small by interleaving it with other strategies like c or e, or due to the fact 
that a small number of instances may already allow the SMT solver to conclude 
the input is unsatisfiable. Moreover, thanks to the results in Sect. 3, this strategy 
is refutationally complete and model sound for quantified formulas in the empty 
theory with equality. 


Example 1. Consider the set of ground literals E = {=P(a), ~P(b), P(c), aR(b)}. 
For the input (E, Vaz. P(x) V R(x)), the strategies in this section will do the 
following. 


1. Conflict based: Since E, P(b)V R(b) H L, this strategy will return {{x +> b}}. 

2. E-matching: This strategy may choose the singleton set of triggers {(P(x))}. 
Based on this trigger, since E = P(x){a > t} ~ P(t) where P(t) € T(E) for 
t = a,b,c, this strategy may return {{x = a}, {x = b}, {£ cH}. 

3. Model-based: This strategy will construct a model .@ for E, where assume 
that P” = Az. ite(a ~ c, T, L) and R = Ax. L. Since s% does not satisfy 
P(a) V R(a), this strategy may return {{x£ > a}}. 

4. Enumerative instantiation: This strategy chooses an ordering on tuples of 
terms, say the lexicographic extension of < where a < b ~ c. Since E does 
not entail P(a) V R(a), this strategy returns {{x£ + a}}. 


In the previous example, clearly {x +> b} is the most useful substitution, since 
it leads to an instance P(b) V R(b) which together with E is unsatisfiable. The 
substitution {x +> c} is definitely not a useful substitution, since it is already 
entailed by P(c) € E. The substitution {x + a} is potentially useful since it 
forces the solver to satisfy P(a) V R(a). Here, we point out that the effect of 
enumerative instantiation and model-based instantiation is essentially the same, 
as both return an instance that is not entailed by E. However, the substitutions 
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produced by enumerative instantiation often have advantages with respect to 
model-based instantiation on unsatisfiable problems. 


Example 2. Consider the set of ground literals E = {=P(a), R(b), S(c)} and the 
quantified clauses Q = {Vx. R(x) V S(x), Vz. =—R(x) V P(x), Vx. =S (x) V P(x)} 
in a mono-sorted signature. Notice that E U Q is unsatisfiable: it suffices to 
consider the instances of the three quantified formulas in Q with «+ a. On such 
an input, model-based instantiation will first construct a model for E. Assume 
this model .@ is such that P% = Ax. 1, R = Az. ite(x ~ b, T, L), and 
S@ = jx. ite(x ~ c, T, L). Assuming enumerative instantiation chooses the 
lexicographic extension of a term ordering < where a < b < c. The following 
table summarizes the result of running the two strategies. 


p xst. M gp xst. Ep m(E,Vvr.y)  u(E,Yz. p) 
R(x) V S(x) a a {{z=>a} {r= a}} 
AR(a) V P(x) b a,b,c {{x + b}} {{xa + a}} 
=S(x) V P(x) c a,b,c {{x = ch} {{xrm a}} 


The second and third columns show the sets of possible values of x that are 
considered with model-based and enumerative instantiation respectively, and the 
third and fourth columns show one possible selection. The instances correspond- 
ing to the three substitutions returned by enumerative instantiation R(a) V S(a), 
=R(a) V P(a) and ~S (a) V P(a) when conjoined with ~P (a) from E are unsatisfi- 
able, whereas the instances produced by model-based instantiation do not suffice 
to show that E is unsatisfiable. Hence, the latter will consider an extension of 
E that satisfies the instances R(a) V S(a), ~—R(b) V P(b) and =~8S(¢) V P(c) and 
guess another model for this extension. 


A key observation is that useful instantiations can be obscured by guesses 
made when constructing models .@. Here, since we decided R(a) = L, the 
substitution {x + a} was not considered when applying model-based instanti- 
ation to the second quantified formula, and since S(a)/ = L, the substitution 
{x +> a} was not considered when applying it to the third. In implementations 
of model-based instantiation, certain values in models are chosen heuristically, 
leading to this behavior. This is done out of necessity, since determining whether 
there exists a model that satisfies quantified formulas, even for a fixed context, 
is a challenging problem. 

On the other hand, the range of substitutions considered by enumerative 
instantiation in the previous example include all terms that correspond to 
instances that are not entailed by E. The substitutions it considers are “mini- 
mally diverse”, that is, in the previous example they introduce new predicates 
on term a only, whereas model-based instantiation introduces new predicates 
on a, b and c. Reducing the number of new terms introduced by instantiations 
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can have a significant positive impact on performance in practice. Furthermore, 
enumerative instantiation has the advantage that a term ordering allows fine- 
grained heuristics better suited for unsatisfiable problems, which we comment 
on in Sect. 4.1. 


Example 3. Consider the sets E = {a # b, b % c, a # c} and Q = {Va. P(x)}. For 
the input (E, Vz.P(x)), model-based quantifier instantiation will first construct a 
model .@ for E, where assume that P“ = Ax.T. It is easy to see. M H y{ar t} 
for a,b,c € T(E), and hence it returns the empty set of substitutions, indicating 
that EU Q is satisfiable. On the other hand, assume enumerative instantiation 


chooses the lexicographic extension of a term ordering < where a < b < c. Since 
E A P(a) and a is smaller than b and c according to <, u(E, P(x)) returns 
the set containing {x +> a}. Subsequently and for similar reasons, two more 
iterations of this strategy will be invoked, resulting in the instances P(b) and 
P(c) before it terminates with the empty set. 


In this example, model-based instantiation was able to terminate on the first 
iteration, since it guessed the correct interpretation for P, whereas enumerative 
instantiation considered substitutions mapping x to each ground term a,b,c 
from E. For this reason, model-based instantiation is typically better suited for 
satisfiable problems. 


4.1 Implementing Enumerative Instantiation 


We comment on several important details concerning the implementation of 
enumerative quantifier instantiation in the SMT solver CVC4. 


Term Ordering. Given a term ordering x, CVC4 considers the extension to 
tuples of terms such that: 


maxi ti < max7_,s;, or 
(t1,...,tn) < (81,-.-, Sn) ef male min 


max/]_,t; = max/_,s; and (t1,...,tn) Slex ($1,---,8n) 


where ~jex is the lexicographic extension of <. For example, if a < b < c, then 
we have that (a,a) < (a,b) < (b,a) < (b,b) < (a,c) < (c,b) < (c¢,c). By this 
ordering, we consider substitutions involving c only after all combinations of 
substitutions involving a and b are considered. This choice is important since it 
leads to instantiations that introduce fewer terms, and are thus more likely to 
lead to conflicts at the ground level. 

The underlying term ordering is determined dynamically based on the current 
set of assertions E. At all times, we maintain a finite list of quantifier-free terms 
such that we have fixed the ordering tı < ... < tn. Then, if all combinations 
of instantiations for t,,...,t, are currently entailed by E, we choose a term 


122 A. Reynolds et al. 


t € T(E) that is such that E Att; for i =1,...,n if one exists, and append it 
to our ordering so that t,, < t. The particular choice of t beyond this criteria is 
arbitrary. An experimental evaluation of more sophisticated term orderings, such 
as those inspired by first-order automated theorem proving [2] is the subject of 
future work. 


Entatlment Checks. For a set of ground equalities and disequalities E, quantified 
formula Vz.y and substitution {7 ++ t}, CVC4 implements a two-layered method 
for checking whether the entailment E = y{z > t} holds. First, we maintain a 


cache of instantiations that have already been returned on previous iterations. 
Hence if E satisfies a set of formulas containing y{z +> 5}, where E = t7 5, 
then the entailment clearly holds. 


Second, we use an incomplete and fast method for inferring when an entail- 
ment holds. We first compute from E congruence classes over T(E). For each 
t € T(E), let [t] be the representative of term t in this equivalence relation. For 
each function f, we use a term index data structure Jp that stores an entry of 
the form ([ti],.--,[tn]) > [f(th,.--,tn)] E % for each uninterpreted function 
application f(t1,...,tn) € T(E). To check the entailment of E — £ where Zis a 
literal, we update £ based on the iterative process until a fixed point is reached: 


Replace each constant t in Z with [t]. 

Replace each function term f(t1,...,t,) in £ with s if (t1,...,tn) > 5 E Fẹ. 
If £ is t xt, replace it by T. 

If fist % s and t % s’ € E where [t] = t and [s’] = s, replace it by T. 


PwnN S 


Then, if the resultant ~ is T, then the entailment holds. Although not shown 
here, the above process is extended in a straightforward way to handle Boolean 
structure, and also can be extended in the presence of other background theories 
in a straightforward way by incorporating theory-specific rewriting steps. 


Restricting Enumeration Space. Enumerative instantiation can be refined further 
by noticing that only a subset of the set of terms T(£) will ever be relevant for 
showing unsatisfiability of a quantified formula. An approach in this spirit was 
used by Ge and de Moura [19], where decidable fragments were identified by 
noticing that the relevant domains of quantified formulas in these fragments are 
guaranteed to be finite. In that work, the relevant domain of a quantified formula 
Vz. is computed based on the terms in E and the structure of its body w. For 
example, t is in the relevant domain of function f for all ground terms f(t), 
the relevant domain of x for a quantified formula containing the term f(z) is 
equal to the relevant domain of f, and so on. A related approach is to use sort 
inference [8,9,22], to compute more precise sort information and thus decrease 
the number of possible instantiations. 
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Example 4. Say EUQ = {a # b, f(a) ~ c}U{Va. P(f(x))}, where a, b,c, x are of 
sort 7, f is a unary function T —> 7, and P is a predicate on 7. It can be shown 
that E U Q is equivalent to E° UQ*® = {a1 # bı, fio(ai) S co} U { Po(fie(x1))}, 
where aj, 61, x; are of sort T1, C2 is of sort T2, fig is of sort Ti > T2, and P» isa 


predicate on 79. 


Sorts can be inferred in this manner using a linear traversal on the input formula 
(for details, see for instance Sect.4 of [22]). This technique narrows the set of 


terms considered by enumerative instantiation. In the above example, whereas 
enumerative instantiation for EU Q might consider the substitutions {x + c} or 
{x => f(c)}, for E° U Q® it would not consider {x1 + c2} since their sorts are 
different, nor would it consider {x1 > fi2(ce)} since fi2(c2) is not a well-sorted 
term. Moreover, the Herbrand universe of an inferred subsort may be finite when 
the universe of its parent sort is infinite. In the above example, the Herbrand 
universe of 7, is {a1,b,} and Tə is {f12(a1), fi2(b1), co}, whereas the Herbrand 
universe of 7 is infinite. 


Compound Strategies. Since the instantiation strategies from this section have 
their respective strengths and weaknesses, it is valuable to combine them. We 
consider two ways of combining strategies which we refer as priority instantiation 
and interleaved instantiation. For base strategies sı and sg, priority instantiation 
(S1;S2) first invokes s1. If this strategy returns a non-empty set of substitutions, 
it returns that set, otherwise it returns the instances returned by sg. On the other 
hand, interleaved instantiation (s1+s2) returns the union of the substitutions 
returned by the two strategies. 

Enumerative instantiation is the most effective when used as a complement 
to heuristic strategies. In particular, we will see in the next section that the 
strategies c;e;u and c;e+u are the most effective strategies for unsatisfiable 
problems in CVC4. 


5 Experiments 


This section reports on our experimental evaluation of different strategies based 
on enumerative instantiation as implemented in the SMT solver CVC4.? We 
present an extensive analysis of enumerative instantiation and compare it with 
implementations of model-based instantiation on both unsatisfiable and satis- 
fiable benchmarks. Experiments were performed on untyped first-order bench- 
marks from the TPTP library [33]°, version 6.4.0, and from SMT-LIB [7], as of 
October 2017, on logics having quantifiers and either uninterpreted functions or 
arrays. For the latter, we considered also logics containing other theories such as 


2 For details, see http: //matryoshka.gforge.inria.fr/pubs/fol_enumerative_inst /. 
3 In SMT parlance, the logic of these benchmarks is quantified EUF. 
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TPTP 14731 4426 6125 6273 5396 4369 6066 6151 6674 6566 6859 


UF 7293 2607 2906 2961 2862 2418 2898 2972 3119 3045 3159 
UFDT 4384 1783 1977 1998 1958 1642 1954 1993 2091 2070 2113 
UFLIA 7745 3622 5022 5037 4867 2638 4966 4989 5253 5132 5279 
UFNIA 3213 1788 1947 1978 1937 1169 1860 1865 2107 2064 2138 
Others 4699 2019 2348 2288 2320 966 2338 2312 2400 2363 2404 


Total 42065 16245 20325 20535 19340 13202 20082 20282 21644 21240 21952 


Fig. 3. CVC4 configurations on unsatisfiable benchmarks with a 300s timeout. 


arithmetic and datatypes. Some benchmarks are solved by all considered config- 
urations of solvers in less than 0.1s. We discarded those 25580 benchmarks. In 
total, 42065 problems were selected, 14731 from TPTP and 27 334 from SMT- 
LIB. All results were produced on StarExec [32], a public execution service for 
running comparative evaluations of solvers, with a timeout of 300s. 

We follow the convention in Sect.4 for identifying configurations based on 
their instantiation strategy. All configurations of CVC4 use conflict-based instan- 
tiation [5,28] with highest priority, so we omit the prefix “c;” from the names of 
CVC4 configurations e.g. e+u in fact means c;e+u. Sort inference, as discussed 
in Sect. 4.1, is also used by all configurations of CVC4. 


5.1 Impact of Enumerative Instantiation in CVC4 


In this section, we highlight the impact of enumerative instantiation in CVC4 
for unsatisfiable benchmarks. Where applicable, we contrast the difference in 
the impact of enumerative instantiation and model-based instantiation on the 
performance of CVC4 on unsatisfiable benchmarks.* 


4 There are technical details that influence the comparison of these techniques 
(see [26]). 
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The comparison of various instantiation strategies supported by CVC4 is 
summarized in Fig.3. In the table, each row is dedicated to a library and logic. 
SMT-LIB is shown in more granularity than TPTP to highlight comparisons 
of individual strategies. The first column identifies the subset and the second 
shows its total number of benchmarks. The next seven columns show the number 
of benchmarks found to be unsatisfiable by each configuration. The last three 
columns show the results of virtual portfolio solvers, with uport combining e, u, 
e;u, and e+u; and mport combining e, m, e;m, and e+m; while port combines 
all seven configurations. 

First, we can see that u outperforms m, as it solves 3043 more benchmarks 
overall. While this is not close to the performance of E-matching (e), it should be 
noted that u is highly orthogonal to e, solving 1737 benchmarks that could not 
be solved by ež. Combining e with either u or m, using either priority or inter- 
leaving instantiation, leads to significant gains in performance. Overall the best 
configuration is e+u, that is, the interleaving of enumerative instantiation and 
E-matching, which solves 20535 benchmarks, that is, 253 more than its coun- 
terpart e+m interleaving model-based instantiation and E-matching, and 1 295 
more than E-matching alone. In the UFLIA logic, the enumerative techniques are 
specially effective in comparison with the model-based ones. In particular, they 
enable CVC4 to solve previously intractable problems, e.g. the family “sexpr” 
with 32 problems. These are notoriously hard problems involving the verification 
of C# programs using Spec# [6]. Z3 can solve 31 of them thanks to its advanced 
optimizations of E-matching [13]. CVC4 previously could solve at most 16 using 
techniques combining e and m, but u alone could solve 27, and all of 32 are 
solved by e+u. Another example is the family “vcc-havoc” in UFNIA, stem- 
ming from the verification of concurrent C with VCC [10]. The strategy e+u 
solves 940 out of 984 problems, outperforming e and its combinations with m, 
which solve at most 860 problems®. 

The portfolio columns of the table in Fig.3 highlight the improvement due 
to enumerative instantiation for CVC4 on the number of solved problems: there 
are 712 more problems overall solved when adding enumerative instantiation in 
the strategies (see columns mport and port). The cactus plot of Fig. 3 shows 
that while the priority strategies are initially quicker, the interleaving ones scale 
better, solving more hard problems than their priority counterparts. Overall, we 
conclude that in addition to being much simpler to implement” instantiation 
strategies that combine E-matching with enumerative instantiation in CVC4 
have a noticeable advantage over those that combine E-matching with model- 
based instantiation on unsatisfiable problems. 


5 Number of uniquely solved benchmarks between configurations are available in [26]. 

€ A detailed comparison by families can be seen in [26]. 

T As a rough estimate, the implementation of enumerative instantiation in CVC4 is 
around 500 lines of code, whereas model-based instantiation is around 4500 lines of 
code. 


126 A. Reynolds et al. 


5.2 Comparison Against Other SMT Solvers 


In this section, we compare our implementation of enumerative instantiation 
in CVC4 against another state-of-the-art SMT solver: Z3 [14] (version 4.5.1) 
which, like CVC4, also relies on E-matching instantiation for handling unsat- 
isfiable problems. Before making the comparison, we first summarize the main 
differences between Z3 and CVC4 here. Z3 uses several optimizations for E- 
matching that are not implemented in CVC4, including the use of code trees 
and techniques for applying instantiation incrementally during the CDCL(7) 
search (see Sect. 5 of [13]). It also implements techniques for removing previously 
considered instantiations from its set of known clauses (see Sect. 7 of [13]). The 
main advantage of CVC4 with respect to Z3 is its use of conflict-based instanti- 
ation c [28], which is enabled by default in all strategies we considered. It also 
supports interleaved instantiation strategies as described in Sect. 4.1, whereas Z3 
does not. In addition to these differences, Z3 implements model-based instanti- 
ation m as described in [19], whereas CVC4 implements model-based instanti- 
ation as described in [29]. Finally, CVC4 implements enumerative instantiation 
as described in this paper, which we compare as an alternative to these imple- 
mentations. 


uport-i 


mport-i 
z3 mport-i 
e 
1014) — Be 


CPU time (s) 


SAA Ped et 


-1 tte Yet H AANE uy? i i i 
19 6000 8000 10000 12000 14000 16000 18000 20000 22000 
Library # z3m ze z3e;m z3 mport-i e uport-i mport-i 
TPTP 14731 2382 4098 5288 5519 5396 6519 6396 
UF 7293 1192 2428 2516 2600 2862 3076 2982 
UFDT 4384 838 1702 1721 1781 1958 2062 2036 
UFLIA 7745 2460 4751 4841 4923 4867 5164 5049 
UFNIA 3213 1089 2074 2112 2238 1937 2091 2015 
Others 4699 990 2226 2332 2346 2320 2393 2357 


Total 42065 8951 17279 18810 19407 19340 21305 20835 


Fig. 4. Z3 and CVC4 on unsatisfiable benchmarks with a 300s timeout. 
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Figure4 summarizes the performance of Z3 on our benchmark set. First, 
like CVC4, using model-based instantiation to complement E-matching leads to 
significant gains in Z3, as z3 e;m solves a total of 1731 more benchmarks than 
solved by E-matching alone z3 e. In comparison with CVC4, the configuration z3 
e outperforms e in the logics with non-linear arithmetic and other theories, while 
e is better in the others. Finally, Z3’s implementation of model-based quantifier 
instantiation by itself z3 m is not effective for unsatisfiable benchmarks, solving 
only 8951 overall. 

To further compare Z3 and CVC4, the third column from the left is the 
number of benchmarks solved by CVC4’s E-matching strategy (e), which we gave 
in Fig.3. The second to last column uport-i gives the number of benchmarks 
solved by at least one of u, e, or e;u in CVC4, where we intentionally omit 
the interleaved strategy e+u, since Z3 does not support a similar strategy. The 
column mport-i is computed similarly. We compare these with the fifth column, 
z3 mport-i, i.e. the number of benchmarks solved by either z3 m, z3 e or 
z3 e;m. A comparison of these is given in the cactus plot of Fig.4. We can 
see that due to Z3’s highly optimized implementations, z3 mport-i solves the 
highest number of problems in less than one second (around 13000), whereas the 
portfolio strategies of CVC4 solve more for larger timeouts. Overall, the best 
portfolio strategy is enumerative instantiation in CVC4, which solves a total of 
21305 unsatisfiable benchmarks overall, which is 1965 more benchmarks than 
z3 mport-i, and 470 more benchmarks than mport-i. We thus conclude that 
the use of enumerative instantiation when paired with E-matching and conflict- 
based instantiation in CVC4 improves the state-of-the-art of instantiation-based 
SMT solvers for unsatisfiable benchmarks. 


Comparison with Automated Theorem Provers. Automated theorem provers like 
Vampire [23] and E [31] use substantially different techniques based on super- 
position, hence we do not provide an extensive comparison here. However, we 
do remark that the gains provided by enumerative instantiation were one of 
the main reasons for CVC4 being more competitive in the 2017 CASC com- 
petition of automatic theorem provers [34]. CVC4 placed third in the category 
with unsatisfiable problems on the empty theory, as in previous years, behind 
superposition-based theorem provers Vampire and E, which implement complete 
strategies. There was, however, a 23% reduction in the number of problems that 
E solves and CVC4 does not, w.r.t. the previous competition, reducing the gap 
between the two systems. 


Satisfiable Benchmarks. For satisfiable benchmarks*, m solves 1350 benchmarks 
across all theories. As expected, this is much higher than the number solved by 


8 For further details, see [26]. 
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u, which solves 510 benchmarks, all from the empty theory. Nevertheless, there 
are 13 satisfiable problems solved by u and not by m, which shows that enu- 
merative instantiation has some orthogonality on satisfiable benchmarks as well. 
We conclude that enumeration not only has superior performance to MBQI on 
unsatisfiable benchmarks, but also can be an alternative for satisfiable bench- 
marks in the empty theory. 


5.3 Artifact 


We have produced an artifact [27] to reproduce the experimental results pre- 
sented in this paper. The artifact contains the binaries of the SMT solvers CVC4 
and Z3, the benchmarks on which they were evaluated, and the running scripts 
for each configuration evaluated. Detailed instructions are given to perform tests 
on the various benchmark families with all configurations within the time limits, 
as well as for retrieving the respective results in CSV format. The artifact has 
been tested in the virtual machine available at [21]. 


6 Conclusion 


We have presented a strengthening of the Herbrand Theorem, and used it to 
devise an efficient technique for enumerative instantiation. The implementation 
of this technique in the state-of-the-art SMT solver CVC4 increases its suc- 
cess rate and outperforms existing implementations of MBQI on unsatisfiable 
problems with quantified formulas. Given its relatively simple implementation, 
this technique is well poised as an alternative to MBQI for being integrated in 
an instantiation based SMT solver to achieve completeness in first-order logic 
with the empty theory and equality, as well as perform improvements also when 
theories are considered. 

Future work includes further restricting the enumeration space, for instance 
with ordering criteria in the spirit of resolution-based theorem proving [3]. 
Another direction is lifting the techniques seen here to reasoning in higher-order 
logic. To handle quantification over functions it is often necessary to enumerate 
expressions, and so performing such an enumeration in a principled manner is 
paramount for this domain. Techniques from syntax-guided function synthesis [1] 
could be combined with enumerative instantiation to pursue this goal. 
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Abstract. State-of-the-art (semi-)decision procedures for non-linear 
real arithmetic address polynomial inequalities by mean of symbolic 
methods, such as quantifier elimination, or numerical approaches such 
as interval arithmetic. Although (some of) these methods offer nice com- 
pleteness properties, their high complexity remains a limit, despite the 
impressive efficiency of modern implementations. This appears to be an 
obstacle to the use of SMT solvers when verifying, for instance, func- 
tional properties of control-command programs. 

Using off-the-shelf convex optimization solvers is known to constitute 
an appealing alternative. However, these solvers only deliver approxi- 
mate solutions, which means they do not readily provide the sound- 
ness expected for applications such as software verification. We thus 
investigate a-posteriori validation methods and their integration in the 
SMT framework. Although our early prototype, implemented in the Alt- 
Ergo SMT solver, often does not prove competitive with state of the art 
solvers, it already gives some interesting results, particularly on control- 
command programs. 


Keywords: SMT - Non-linear real arithmetic 
Polynomial inequalities - Convex optimization 


1 Introduction 


Systems of non-linear polynomial constraints over the reals are known to be solv- 
able since Tarski proved that the first-order theory of the real numbers is decid- 
able, by providing a quantifier elimination procedure. This procedure has then 
been much improved, particularly with the cylindrical algebraic decomposition. 
Unfortunately, its doubly exponential complexity remains a serious limit to its 
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scalability. It is now integrated into SMT solvers [23]. Although it demonstrates 
very good practical results, symbolic quantifier elimination seems to remain an 
obstacle to scalability on some problems. In some cases, branch and bound with 
interval arithmetic constitutes an interesting alternative [17]. 

We investigate the use of numerical optimization techniques, called semi- 
definite programming, as an alternative. We show in this paper how solvers 
based on these techniques can be used to design a sound semi-decision proce- 
dure that outperforms symbolic and interval-arithmetic methods on problems of 
practical interest. A noticeable characteristic of the algorithms implemented in 
these solvers is to only compute approximate solutions. 

We explain this by making a comparison with linear programming. There are 
two competitive methods to optimize a linear objective under linear constraints: 
the interior point and the simplex algorithms. The interior point algorithm starts 
from some initial point and performs steps towards an optimal value. These 
iterations converge to the optimum but not in finitely many steps and have to be 
stopped at some point, yielding an approximate answer. In contrast, the simplex 
algorithm exploits the fact that the feasible set is a polyhedra and that the 
optimum is achieved on one of its vertices. The number of vertices being finite, 
the optimum can be exactly reached after finitely many iterations. Unfortunately, 
this nice property does not hold for spectrahedra, the equivalent of polyhedra 
for semi-definite programming. Thus, all semi-definite programming solvers are 
based on the interior-point algorithm, or a variant thereof. 

To illustrate the consequences of these approximate solutions, consider the 
proof of e < c with e a complicated ground expression and ca constant. e < c can 
be proved by exactly computing e, giving a constant c’, and checking that c < c. 
However, if e is only approximately computed: e € [c’—e, c’ +e], this is conclusive 
only when c + e < c. In particular, if e is equal to c, an exact computation is 
required. This inability to prove inequalities that are not satisfied with some 
margin is a well known property of numerical verification methods [42] which 
can then be seen as a trade-off between completeness and computation cost. 

The main point of this paper is that, despite their incompleteness, numerical 
verification methods remain an interesting option when they enable to practically 
solve problems for which other methods offer an untractable complexity. Our 
contributions are: 


(1) a comparison of two sound semi-decision procedures for systems of non-linear 
constraints, which rely on off-the-shelf numerical optimization solvers, 

(2) an integration of these procedures in the Alt-Ergo SMT solver, 

(3) an experimental evaluation of our approach on a set of benchmarks coming 
from various application domains. 


The rest of this paper is organized as follows: Sect. 2 gives a practical example 
of a polynomial problem, coming from control-command program verification, 
better handled by numerical methods. Section3 is dedicated to preliminaries. 
It introduces basic concepts of sum of squares polynomials and semi-definite 
programming. In Sect. 4, we compare two methods to derive sound solutions to 
polynomial problems from approximate answers of semi-definite programming 
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typedef struct { double x0, x1, x2; } state; 

/*@ predicate inu(state *s) = 6.04 * s->r0 * s->a20 - 9.65 * s->20 * s->21 
@ ~ 2.26 * s->20 * s->22 + 11.36 * 3-7701 * s->21 
@ + 2.67 * s->a1 * s->22 + 3.76 * s->a2 * s->a2 <= 1; */ 


/*@ requires \valid(s) BB inu(s) BB -1 <= inO <= 1; 
@ ensures inu(s); */ 

void step(state *s, double in0) { 
double pre_x0 = s->x0, pre_xi = s->x1, pre_x2 = s->x2; 
s->x0 0.9379 * pre_xO - 0.0381 * pre_x1 - 0.0414 * pre_x2 + 0.0237 * in0; 
s->x1 -0.0404 * pre_x0 + 0.968 * pre_x1 - 0.0179 * pre_x2 + 0.0143 * in0; 
s->x2 0.0142 * pre_x0 - 0.0197 * pre_x1 + 0.9823 * pre_x2 + 0.0077 * in0; 


Fig. 1. Example of a typical control-command code in C. 


solvers. Section 5 provides some implementation details and discuss experimental 
results. Finally, Sect. 6 concludes with some related and future works. 


2 Example: Control-Command Program Verification 


Control-command programs usually iterate linear assignments periodically over 
time. These assignments take into account a measure (via some sensor) of the 
state of the physical system to control (called plant by control theorists) to 
update an internal state and eventually output orders back to the physical system 
(through some actuator). Figure1 gives an example of such an update, inO 
being the input and s the internal state. The comments beginning by @ in the 
example are annotations in the ACSL language [12]. They specify that before 
the execution of the function (requires) s must be a valid pointer satisfying the 
predicate inv and |in0| < 1 must hold. Under these hypotheses, s still satisfies 
inv after executing the function (ensures). 

To prove that the internal state remains bounded over any execution of the 
system, a quadratic polynomial! can be used as invariant?. Checking the validity 
of these invariants then leads to arithmetic verification conditions (VCs) involv- 
ing quadratic polynomials. Such VCs can for instance be generated from the 
program of Fig. 1 by the Frama-C/Why3 program verification toolchain [12,16]. 
Unfortunately, proving the validity of these VCs seem out of reach for current 
state-of-the-art SMT solvers. For instance, although Z3 [13] can solve smaller 
examples with just two internal state variables in a matter of seconds, it ran for 
a few days on the three internal state variable example of Fig. 1 without reaching 
a conclusion®. In contrast, our prototype can prove it in a fraction of second, as 
well as other examples with up to a dozen variables. 


' For instance, the three variables polynomial in inv in Fig. 1. 

? Control theorists call these invariants sublevel sets of a quadratic Lyapunov function. 
Such functions exist for linear systems if and only if they do not diverge. 

3 This is the case even on a simplified version with just arithmetic constructs, i.e., 
expurgated of all the reasoning about pointers and the C memory model. 
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Verification of control-command programs is a good candidate for numerical 
methods. These systems are designed to be robust to many small errors, which 
means that the verified properties are usually satisfied with some margin. Thus, 
the incompleteness of numerical methods is not an issue for this kind of problems. 


3 Preliminaries 


3.1 Emptiness of Semi-algebraic Sets 


Our goal is to prove that conjunctions of polynomial inequalities are unsat- 


isfiable, that is, given some polynomials with real coefficients p1,...,Dm € 
Ria], we want to prove that there does not exist any assignment for the 
n variables x1,...,%, © R” such that all inequalities pı(£z1,..., £n) > 


0,..-,;Pm(@1,---;%n) > 0 hold simultaneously. In the rest of this paper, the 
notation p > 0 (resp. p > 0) means that for all x € R”, p(x) > 0 (resp. p(x) > 0). 


Theorem 1. If there exist polynomials ri € R[x] such that 


-X rip > 0 and Vi,r; >0 (1) 


then the conjunction N; pi => 0 is unsatisfiable? . 


Proof. Assume there exist x € R” such that for all i, pj(x) > 0. Then, since 
ri > 0, we have r;(x)pi(z) > 0 hence (3°; 7; p;) (x) > 0 which contradicts 
= X Ti Pi > 0. 


In fact, under some hypotheses” on the p;, the condition (1) is not only 
sufficient but also necessary, as stated by the Putinar’s Positivstellensatz [27, 
Sect. 2.5.1]. Unfortunately, no practical bound is known on the degrees of the 
polynomials r;. In our prototype, we restrict the degrees of each r; to° d—deg(p;) 
where d := max;(deg(p;)), so that $; r; pi is a polynomial of degree d. This is a 
first source of incompleteness, although benchmarks show that it already enables 
to solve many interesting problems. 

The sum of squares (SOS) technique [26, 36] is an efficient way to numerically 
solve polynomial problems such as (1). The next sections recall its main ideas. 


3.2 Sum of Squares (SOS) Polynomials 
A polynomial p € R[z] is said to be SOS if there exist polynomials h; € R[x] 


such that for all z, 
pe) = X` hi). 


Although not all non negative polynomials are SOS, being SOS is a sufficient 
condition to be non negative. 


4 Or, with different words, the semi-algebraic set {x € R” | Vi, pi(x) > 0} is empty. 
5 For instance, when one of the sets {x € R” | pi(x) > 0} is bounded. 


ê More precisely to 2 [os] as deg(r;) is necessarily even since r; > 0. 
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Example 1 (from [36]). Considering p(x1, £2) = 2x} +2x32x_ — 27x32 +523, there 
exist hı(z1, £2) = Fa (2x? — 3a3 +2142) and ho(£1, £2) = J (x3 + 32122) 
such that p = h? + h3. This proves that for all x1, 72 € R, p(x1, x2) > 0. 


Any polynomial p of degree 2d (a non negative polynomial is necessarily of 
even degree) can be written as a quadratic form in the vector of all monomials 
of degree less or equal to d: 


p(t) = 27Qz (2) 
where z = (1, Diyases Uns E z4)” and Q is a constant symmetric matrix. 


Example 2. For p(x1, £2) = 2x} + 2xłx£3 — xix + 5x3 , we have” 


29T 2 

Ti qi1 912 913 Ti 

2 2 

p(z1, £2) = T3 qı2 922 923 T32 
T1ıT2 q13 923 933 L122 


= quri + 2qisxexe + (gas + 2q12)x x3 + 2qo37103 + qaoxs. 


Thus gil = 2, 2q13 = 2; 433 + 2q12 = —1, 2q23 = 0 and q22 = 5. Two possible 
examples for the matrix Q are shown below: 


211 2 —31 
Q=l150], Q@=]l]-350 
10-3 1 05 


The polynomial p is then SOS if and only if there exists a positive semi- 
definite matrix Q satisfying (2). A matrix Q is called positive semi-definite, noted 
Q > 0, if, for all vector x, ztTQgx > 0. Just as a scalar q € R is non negative 
if and only if q = r? for some r € R (typically r = \/q), Q = 0 if and only if 
Q = R” R for some matrix R (then, for all x, <7 Qx = (Rx)! (Rx) = ||Rz||2 > 0). 
The vector Rz is then a vector of polynomials h; such that p = >, h?. 


Example 3. In the previous example, the matrix Q is not positive semi-definite 
(for x = [0,0, 1)", zTQ z = —3). In contrast, Q' + 0 as Q! = RTR with 


eee 
R= lord 


giving the decomposition of Example 1. 


3.3 Semi-Definite Programming (SDP) 


Given symmetric matrices C, A1,..., Am E R®*** and scalars aj,...,@dm E R, 
the following optimization problem is called semi-definite programming 
minimize tr(CQ) 


subject to tr(A1Q) = a, 


tr(AmQ) = üm 
Q=0 


7T All monomials of p are of degree 4, so z does not need to contain 1, xı and z2. 
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where the symmetric matrix Q € R**® is the variable, tr(M) = $; Mi, denotes 
the trace of the matrix M and Q = 0 means Q positive semi-definite. 


Remark 1. Since the matrices are symmetric, tr(AQ) = tr(A7Q) = 
i; 4i,7Qi,;- The constraints tr(AQ) = a are then affine constraints between 
the entries of Q. 


As we have just seen in Sect. 3.2, existence of a SOS decomposition amounts 
to existence of a positive semi-definite matrix satisfying a set of affine constraints, 
that is a solution of a semi-definite program. Semi-definite programming is a 
convex optimization problem for which there exist efficient numerical solvers [7, 
44], thus enabling to solve problems involving polynomial inequalities over the 
reals. 


3.4 Parametric Problems 


Up to now, we have only seen how to check whether a given polynomial p with 
fixed coefficients is SOS (which implies its non negativeness). However, according 
to Sect. 3.1, we need to solve problems in which polynomials p have coefficients 
that are not fixed but parameters. One of the great strengths of SOS program- 
ming is its ability to solve such problems. 

An unknown polynomial p € R[x] of degree d with n variables can be written 


a m 
p= J Paki at 
ayt+an<d 


where the pa are scalar parameters. A constraint such as r; > 0 in (1) can then 
be replaced by r; is SOS, that is: JQ > 0,r; = 27 Qz, which is a set of affine 
equalities between the coefficients of Q and the coefficients rj. of rj. This can 
be cast as a semi-definite programming problemë. 

Thus, problems with unknown polynomials p, as the one presented in 
Sect. 3.1, can be numerically solved through SOS programming. 


Remark 2 (Complexity). The number s of monomials in n variables of degree less 
than or equal to d, i.e., the size of the vector z in the decomposition p(x) = z7Q z, 
is s := (Pre), This is polynomial in n for a fixed d (and vice versa). In practice, 
current SDP solvers can solve problems where s is about a few hundreds. This 
makes the SOS relaxation tractable for small values of n and d (n ~ 10 and 
d ~ 3, for instance). Our benchmarks indicate this is already enough to solve 
some practical problems that remain out of reach for other methods. 


+ 


8 By encoding the ria € R as r 


la — fia With Na Tia > 0 and putting the new 
variables in a block diagonal matrix variable Q’ := diag(Q,... EA Trasts): 
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4 Numerical Verification of SOS 


According to Sect.3.1, a conjunction of polynomial constraints can be proved 
unsatisfiable by exhibiting other polynomials satisfying some constraints. 
Section 3.4 shows that such polynomials can be efficiently found by some numer- 
ical optimization solvers. Unfortunately, due to the algorithms they implement, 
we cannot directly trust the results of these solvers. This section details this 
issue and reviews two a-posteriori validation methods, with their respective 
weaknesses. 


4.1 Approximate Solutions from SDP Solvers 


In practice, the matrix Q returned by SDP solvers upon solving an SDP prob- 
lem (3) does not precisely satisfy the equality constraints, due both to the algo- 
rithms used and their implementation with floating-point arithmetic. Therefore, 
although the SDP solver returns a positive answer for a SOS program, this does 
not constitute a valid proof that a given polynomial is SOS. 

Most SDP solvers start from some Q > 0 not satisfying the equality con- 
straints (for instance the identity matrix) and iteratively modify it in order 
to reduce the distance between tr(A;Q) and a; while keeping Q positive semi- 
definite. This process is stopped when the distance is deemed small enough. This 
final distance e€ is called the primal infeasibility, and is one of the result quality 
measures displayed by SDP solvers’. Therefore, we do not obtain a Q satisfying 
tr(A;Q) = a; but rather tr(A;Q) = a; + €; for some small e; such that |e;| < €. 


4.2 Proving Existence of a Nearby Solution 


This primal infeasibility has a simple translation in terms of our original SOS 
problem. The polynomial equality p = zT Q z is encoded as one scalar constraint 
tr(A;Q) = a; for each coefficient a; of the polynomial p (c.f., Examples 2). 
coefficients of the polynomials p and 27 Q z differ by some e; and, since |e;| < €, 
there exists a matrix E € R°*® such that, for all i, j, |E; j| < € and 


p=2"(Q+E)z. (4) 


Proving that Q + E = 0 is now enough to prove that the polynomial p is SOS, 
hence non negative. A sufficient condition is to check!° Q — seI > 0. 

As seen in Sect.3.2, checking that a matrix M is positive semi-definite 
amounts to exhibiting a matrix R such that M = RT R. The Cholesky decom- 
position algorithm [45, Sect. 1.4] computes such a matrix R. Given a matrix 
M €R**S, it attempts to compute R such that M = RTR and when M is not 
positive semi-definite, it fails by attempting to take the square root of a negative 
value or perform a division by zero. 


° Typically, e ~ 1078. 
10 Tn order to get good likelihood for this to hold, we ask the SDP solver for Q—2seI > 0 
rather than Q > 0, as solvers often return matrices Q slightly not positive definite. 


An SMT Solver for Control-Command Software Verification 139 


Due to rounding errors, a simple floating-point Cholesky decomposition 
would produce a matrix R not exactly satisfying the equality M = RTR, hence 
not proving M > 0. However, these rounding errors can be bounded by a matrix 
B so that, when the floating-point Cholesky decomposition of M — B succeeds, 
then M > 0 is guaranteed to hold. Moreover, B can be easily computed from 
the matrix M and the characteristics of the floating-point format used [41]. 

To sum up, the following verification procedure can prove that a given poly- 
nomial p is SOS". 


Let Q € R*** be the approximate solution returned by an SDP 
solver for the problem p = z7Qz A Q > 0. Then, 


1. Compute a bound e€ on the coefficients of p — zT Q z. 
2. Check that Q — sel = 0. 


Complexity. Note that step 1 can be achieved using floating-point interval 
arithmetic in O(s”) operations while the Cholesky decomposition in step 2 
requires O(s) floating-point operations. Thus, the whole verification method 
takes O(s?) floating-point operations which, in practice, constitutes a very small 
overhead compared to the time required by the SDP solver to compute Q. 


Soundness. It is interesting to notice that the soundness of the method does 
not rely on the SDP solver. Thanks to this pessimistic method, the trusted code- 
base remains small, and efficient off-the-shelf solvers can be used as untrusted 
oracles. The method was even verified [31,38] within the Coq proof assistant. 


Incompleteness. Numerical verification methods can only prove inequalities 
satisfied with some margin. Here, if the polynomial p to prove SOS (hence p > 0) 
reaches the value 0, this usually means that the feasible set of the SDP problem 
{Q | p=2702,0% 0} has an empty relative interior (i.e., there is no point Q 
in this set such that a small ball centered on Q is included in {M | M > 0}) 
and the method does not work, as illustrated on Fig. 2. This is a second source 
of incompleteness of our approach, that adds to the limitation of degrees of 
polynomials searched for, as presented in Sect. 3.1. 


Remark 3. The floating-point Cholesky decomposition is theoretically a third 
source of incompleteness. However, it is negligible as the entries of the bound 
matrix B are, in practice, orders of magnitude smallers than the accuracy e of 
the SDP solvers [40]. 


11 Tt is worth noting that the value reported by the solver for €, being just computed 
with floating-point arithmetic, cannot be formally trusted. It must then be recom- 
puted. 
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{M|p=2z7™M2z} {M | M = 0} 


Fig. 2. When the feasible set has an empty interior, the subspace {M | p=2z'M zh 
is tangent to {M | M > 0}. Thus the ball { Q + E } intersecting the subspace almost 
never lies in {M | M > 0}, making the proof fail. 


4.3 Rounding to an Exact Rational Solution 


The most common solution to verify results of SOS programming is to round 
the output of the SDP solver to an exact rational solution [19, 24,33]. 

To sum up, the matrix Q returned by the SDP solver is first projected to 
the subspace {M | p=2z™M z} then all its entries are rounded to rationals with 
small denominators (first integers, then multiples of 4, $, . . .)"?. For each round- 
ing, positive semi-definiteness of the resulting matrix Q is tested using a complete 
check, based on a LDLT decomposition!? [19]. The rationale behind this choice 
is that problems involving only simple rational coefficients can reasonably be 
expected to admit simple rational solutions!*. 

Using exact solutions potentially enables to verify SDP problems with empty 
relative interiors. This means the ability to prove inequalities without margin, to 
distinguish strict and non-strict inequalities and even to handle (dis)equalities. 
All of this nevertheless requires a different relaxation scheme than (1). 


Example 4. To prove zı > 0A 22 >0Aq =OA q2 =OAp > O unsatisfiable, 
with qı := r? } te Ts Fi 2, dg := T1£3 + £o£4 and p := T3£4 — © Xo, 
one can look for polynomials l1,l2 and SOS polynomials s1,...,Sg such that 
liqı + l2q2 + S1 + S2p + S3£1 + S4£1p + S5£2 + S6£2P + S7L1£2 + Sg£1£2p + p = 0. 
Rounding the result of an SDP solver yields lı = —4 (£1£2 — £344), l2 = 
at ay 2 ale ee eee ee ay ats 
$ (£223 +2104), 52 = 4 (£3 + x2), s7 = 4 (£? +03 +23 + x?) and sı = 53 = 
S4 = S5 = S6 = Sg = 0. This problem has no margin, since when replacing p > 0 
by p > 0, (£1, £2, £3, £4) = (0, V2, 0,0) becomes a solution. 


Under some hypotheses, this relaxation scheme is complete, as stated by 
a theorem from Stengle [27, Theorem 2.11]. However, similarly to Sect. 3.1, no 
practical bound is known on the degrees of the relaxation polynomials. 


12 Tn practice, to ensure that the rounded matrix Q still satisfy the equality p = z7Q z, 
a dual SDP encoding is used, that differs from the encoding introduced in Sect. 3. 
This dual encoding is also called image representation [36, Sect. 6.1]. 

13 The LDLT decomposition expresses a positive semi-definite matrix M as M = LDLT 
with L a lower triangular matrix and D a diagonal matrix. 

14 However, there exist rational SDP problems that do not admit any rational solution. 
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Complexity. The relaxation scheme involves products of all polynomials 
appearing in the original problem constraints. The number of such products, 
being exponential in the number of constraints, limits the scalability of the app- 
roach. 

Moreover, to actually enjoy the benefits of exact solutions, the floating-point 
Cholesky decomposition introduced in Sect. 4.2 cannot be used and has to be 
replaced by an exact rational decomposition!®. Computing decompositions of 
large matrices can then become particularly costly as the size of the involved 
rationals can blow up exponentially during the computation. 


Soundness. The exact solutions make for an easy verification. The method is 
thus implemented in the HOL Light [19] and Coq [4] proof assistants. 


Incompleteness. Although this verification method can work for some SDP 
problems with an empty relative interior, the rounding heuristic is not guar- 
anteed to provide a solution. In practice, it tends to fail on large problems or 
problems whose coefficients are not rationals with small numerators and denom- 
inators. 


5 Experimental Results 


5.1 The OSDP Library 


The SOS to SDP translation described in Sect. 3, as well as the validation meth- 
ods described in Sect. 4 have been implemented in our OCaml library OSDP. 
This library offers a common interface to the SDP solvers! Csdp [6], Mosek [2] 
and SDPA [46], giving simple access to SOS programming in contexts where 
soundness matters, such as SMT solvers or program static analyzers. It is com- 
posed of 5 kloc of OCaml and 1 kloc of C (interfaces with SDP solvers) and is 
available under LGPL license at https://cavale.enseeiht.fr/osdp/. 


5.2 Integration of OSDP in Alt-Ergo 


Alt-Ergo [5] is a very effective SMT solver for proving formulas generated by 
program verification frameworks. It is used as a back-end of different tools and 
in various settings, in particular via the Why3 [16] platform. For instance, the 
Frama-C [12] suite relies on it to prove formulas generated from C code, and the 
SPARK [21] toolset uses it to check formulas produced from Ada programs. It is 
also used by EasyCrypt [3] to prove formulas issued from cryptographic protocols 
verification, from the Cubicle [10] model-checker, and from Atelier-B [1]. 


15 The Cholesky decomposition, involving square roots, cannot be computed in rational 
arithmetic, however its LDLT variant can. 
16 Csdp is used for the following benchmarks as it provides the best results. 
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Alt-Ergo’s native input language is a polymorphic first-order logic à la ML 
modulo theories, a very suitable language for expressing formulas generated in 
the context of program verification. Its reasoning engine is built on top of a 
SAT solver that interacts with a combination of decision procedures to look for 
a model for the input formula. Universally quantified formulas, that naturally 
arise in program verification, are handled via E-matching techniques. Currently, 
Alt-Ergo implements decision procedures for the free theory of equality with 
uninterpreted symbols, linear arithmetic over integers and rationals, fragments 
of non-linear arithmetic, enumerated and records datatypes, and the theory of 
associative and commutative function symbols (hereafter AC). 

Figure 3 shows the simplified architecture of arithmetic reasoning framework 
in Alt-Ergo, and the OSDP extension. The first component in the figure is a 
completion-like algorithm AC(LA) that reasons modulo associativity and com- 
mutativity properties of non-linear multiplication, as well as its distributivity 
over addition'’. AC(LA) is a modular extension of ground AC completion with a 
decision procedure for reasoning modulo equalities of linear integer and rational 
arithmetic [9]. It builds and maintains a convergent term-rewriting system mod- 
ulo arithmetic equalities and the AC properties of the non-linear multiplication 
symbol. The rewriting system is used to update a union-find data-structure. 


(1) AC(LA) framework 


| AC-Cor on P| S | Union-Find 
j Modulo Theories 


encoding 


unsat/uhknown 


(2) Interval Calculus 


Fig. 3. Alt-Ergo’s arithmetic reasoning framework with OSDP integration. 


The second component is an Interval Calculus algorithm that computes 
bounds of (non-linear) terms: the initial non-linear problem is first relaxed by 
abstracting non-linear parts, and a Fourier-Motzkin extension!® is used to infer 
bounds on the abstracted linear problem. In a second step, axioms of non-linear 
arithmetic are internally applied by intervals propagation. These two steps allow 
to maintain a map associating the terms of the problems (that are normalized 
w.r.t. the union-find) to unions of intervals. 

Finally, the last part is the SAT solver that dispatches equalities and inequal- 
ities to the right component and performs case-split analysis over finite domains. 
Of course, this presentation is very simplified and the exact architecture of Alt- 
Ergo is much more complicated. 


17 Addition and multiplication by a constant is directly handled by the LA module. 
18 We can also use a simplex-based algorithm [8] for bounds inference. 
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pi := (pı — a1)(b1 — pr), -- +) Pk := (Pe — Ak) (be — pr) 
// or p; := pi — ai when b; = +00 or p; := bi — pi when a; = —00 
d := max { deg(p;) } 
a 
encode — ` rip; is SOS, rı is SOS, ...r% is SOS 


i=1 


as an SDP problem — Drip; = zé Qo Zo, T1 = z7 Qı EE ThE 27 Qk Zk 
with deg(r;) := 2 ) een 
call an SDP solver and retrieve rı, rk and Qo, Qi, ...,; Qk 


. T a 
overapproximate €; := max 4 |ca| | ri — zi Qi zi = Cat 
Q 


if 1 € zo A Qo — #|zoleol >0AQi - #lalel > OA...AQze- #\zn lent > 0 then 
return Unsat 

else 
return Unknown 

end if 


k 
Fig. 4. Semi-decision procedure to prove VAN pi € [ai, bs] unsat. #|z| is the size of the 
i=1 

vector z and > 0 is tested with a floating-point Cholesky decomposition [41]. 


The integration of OSDP in Alt-Ergo is achieved via the extension of the 
Interval Calculus component of the solver, as shown in Fig. 3: terms that are 
polynomials, and their corresponding interval bounds, form the problem (1) 
which is given to OSDP. OSDP attempts to verify its result with the method 
of Sect. 4.2. When it succeeds, the original conjunction of constraints is proved 
unsat. Otherwise, (dis)equalities are added and OSDP attempts a new proof by 
the method of Sect. 4.3. In case of success, unsat is proved, otherwise satisfia- 
bility or unsatisfiability cannot be deduced. Outlines of the first algorithm are 
given in Fig. 4 whereas the second one follows the original implementation [19]. 

Our modified version of Alt-Ergo is available under CeCILL-C license at 
https: //cavale.enseeiht.fr/osdp/aesdp/. 


Incrementality. In the SMT context, our theory solver is often succesively 
called with the same problem with a few additional constraints each time. It 
would then be interesting to avoid doing the whole computation again when 
a constraint is added, as is usually done with the simplex algorithm for linear 
arithmetic. 

Some SDP solvers do offer to provide an initial point. Our experiments how- 
ever indicated that this significantly speeds up the computation only when the 
provided point is extremely close to the solution. A bad initial point could even 
slow down the computation or, worse, make it fail. This is due to the very differ- 
ent nature of the interior point algorithms, compared to the simplex, and their 
convergence properties [7, Part III]. Thus, speed ups could only be obtained 
when the previous set of constraints was already unsatisfiable, 1.e. a useless case. 
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Small Conflict Sets. When a set of constraints is unsatisfiable, some of them 
may not play any role in this unsatisfiability. Returning a small subset of unsat- 
isfiable constraints can help the underlying SAT solver. Such useless constraints 
can easily be identified in (1) when the relaxation polynomial r; is 0. A common 
heuristic to maximize their number is to ask the SDP solver to minimize (the 
sum of) the traces of the matrices Qj. 

When using the exact method of Sect. 4.3, the appropriate r; are exactly 0. 
Things are not so clear when using the approximate method of Sect. 4.2 since the 
ri are only close to 0. A simple solution is to rank the r; by decreasing trace of 
Qi before performing a dichotomy search for the smallest prefix of this sequence 
proved unsatisfiable. Thus, for n constraints, log(n) SDPs are solved. 


5.3 Experimental Results 


We compared our modified version of Alt-Ergo (v. 1.30) to the SMT solvers ran 
in both the QF_NIA and QF_NRA sections of the last SMT-COMP. We ran 
the solvers on two sets of benchmarks. The first set comes from the QF_NIA 
and QF_ NRA benchmarks for the last SMT-COMP. The second set contains 
four subsets. The C problems are generated by Frama-C/Why3 [12,16] from 
control-command C programs such as the one from Sect. 2, with up to a dozen 
variables [11,39]. To distinguish difficulties coming from the handling of the 
memory model of C, for which Alt-Ergo was particularly designed, and from the 
actual non-linear arithmetic problem, the quadratic benchmarks contain sim- 
plified versions of the C problems with a purely arithmetic goal. To demonstrate 
that the interest of our approach is not limited to this initial target application, 
the flyspeck benchmarks come from the benchmark sets of dReal!® [18] and 
global-opt are global optimization benchmarks [34]. All these benchmarks are 
available at https://cavale.enseeiht.fr/osdp/aesdp/. Since our solver only targets 
unsat proofs, benchmarks known sat were removed from both sets. 

All experiments were conducted on an Intel Xeon 2.30 GHz processor, with 
individual runs limited to 2GB of memory and 900s. The results are presented 
in Tables 1, 2 and 3. For each subset of problems, the first column indicates the 
number of problems that each solver managed to prove unsat and the second 
presents the cumulative time (in seconds) for these problems. AE is the origi- 
nal Alt-Ergo, AESDP our new version, AESDPap the same but using only the 
approximate method of Sect. 4.2 and AESDPex using only the exact method of 
Sect. 4.3. All solvers were run with default options, except CVC4 which was run 
with all its --nl-ext* options. 

As seen in Tables 1 and 2, despite an improvement over Alt-Ergo alone, our 
development is not competitive with state-of-the-art solvers on the QF_NIA and 
QF_NRA benchmarks. In fact, the set of problems solved by any of our Alt-Ergo 
versions is strictly included in the set of problems solved by at least one of the 
other solvers. The most commonly observed source of failure for AESDPap here 
comes from SDPs with empty relative interior. Although AESDPex can handle 
such problems, it is impaired by its much higher complexity. 


19 Removing problems containing functions sin and cos, not handled by our tool. 
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Table 1. Experimental results on benchmarks from QF_NIA. 


AE |AESDP|AESDPap|/ AESDPex|CVC4|Smtrat Yices2|Z3 
AProVE (746) unsat |103 |319 359 318 586 185 709 |252 
time |7387|23968 |7664 22701 10821 |3879 1982 |5156 
calypto (97) unsat|92 |88 88 89 87 89 97 95 
time |357 |679 489 816 7 754 409 |613 
LassoRanker (102) |unsat|57 | 62 64 63 72 20 84 84 
time |9 959 274 878 27 12 595 | 2538 
LCTES (2) unsat|0 |o 0 0 1 0 0 0 
time |0 0 0 0 0 0 0 0 
leipzig (5) unsat |0 0 0 0 0 0 1 0 
time |0 0 0 0 0 0 0 0 
mem (161) unsat |0 0 0 0 4 0 0 4 
time |0 0 0 0 2489 |0 0 2527 
UltimateAutom (7) |unsat | 1 7 7 7 6 1 7 7 
time |0.35 |0.73 0.62 0.69 0.03 | 7.22 0.04 |0.31 
UltimateLasso (26) |unsat|26 | 26 26 26 4 26 26 26 
time |118 |212 126 215 66 177 6 21 
total (1146) unsat |279 |502 544 503 780 321 924 |468 
time |7872|25818 |8553 24611 13411 |4829 |2993 |10855 


However good results are obtained on the more numerical?” second set of 
benchmarks. In particular, control-command programs with up to a dozen vari- 
ables are verified while other solvers remain limited to two variables. Playing a 
key point in this result, the inequalities in these benchmarks are satisfied with 
some margin. For control command programs, this comes from the fact that 
they are designed to be robust to many small errors. This opens new perspec- 
tives for the verification of functional properties of control-command programs, 
particularly in the aerospace domain, our main application field at ONERA?!. 

Although solvers such as dReal, based on branch and bound with interval 
arithmetic could be expected to perform well on these numerical benchmarks, 
dReal solves less benchmarks than most other solvers. Geometrically speaking, 
the C benchmarks require to prove that an ellipsoid is included in a slightly larger 
one, i.e., the borders of both ellipsoids are close from one another. This requires 
to subdivide the space between the two borders in many small boxes so that none 
of them intersects both the interior of the first ellipsoid and the exterior of the 
second one. Whereas this can remain tractable for small dimensional ellipsoids, 
the number of required boxes grows exponentially with the dimension, which 
explains the poor results of dReal. This issue is unfortunately shared, to a large 
extent, by any linear relaxation, including more elaborate ones [30]. 


20 Involving polynomials with a few dozen monomials or more and whose coefficients 
are not integers or rationals with small numerators and denominators. 
21 French public agency for aerospace research. 
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Table 2. Experimental results on benchmarks from QF_NRA. 


AE |AESDP/AESDPap|/AESDPex|CVC4 |Smtrat| Yices2|Z3 
Sturm-MBO (300) |unsat|155 |155 155 155 285 |285 |2 47 
time |12950|13075 |13053 12973 1403 |620 JO 21 
Sturm-MGC (7) unsat|0 0 0 0 1 1 0 7 
time |0 0 0 0 7 0 0 0 
Heizmann (68) unsat|0 0 0 0 1 1 11 3 
time |0 0 0 0 16 0 2083 |41 
hong (20) unsat | 1 20 20 20 20 20 8 9 
time |0 28 24 27 ug 0 240 6 
hycomp (2494) unsat|1285 |1266 1271 1265 2184 |1588 |2182 |2201 
time |15351|15857 |16080 14909 208 13784 |1241 |4498 
keymaera (320) unsat|261 |291 278 291 249 |307 270 |318 
time |36 356 97 360 4 13 359 2 
LassoRanker (627) |unsat|0 0 0 0 441 |0 236 |119 
time |0 0 0 0 32786 /0 30835 |1733 
meti-tarski (2615) |unsat|1882 |2273 2267 2241 1643 |2520 |2578 |2611 
time |10 91 65 73 804 3345 |2027 |337 
UltimateAutom (13)|unsat|0 0 0 0 5 0 12 13 
time |0 0 0 0 0.52 JO 57.19 |19.23 
zankl (85) unsat|14 24 24 24 24 19 32 27 
time |1.00 |15.46 |16.09 15.67 9.40 |13.47 |7.22 |0.43 
total (6549) unsat|3571 |4029 4015 3996 4853 |4740 |5331 |5355 
time |28348|29423 |29334 28357 35239 |17775 |36849 |6658 


Table 3. Experimental results on benchmarks from [11, 18,34,39]. 


AE |AESDP|AESDPap AESDPex|CVC4 Smtrat|Yices2|Z3 dReal 
C (67) unsat|11 |63 63 13 0 0 0 0 0 
time |0.05/39.78 /|40.01 1.18 0 0 0 0 0 
quadratic (67) junsat|13 |67 67 15 14 18 25 25 13 
time |0.06/14.68 |15.44 0.08 2.46 1.26 357.20 257.39 | 23.36 
flyspeck (20) |unsat|1 |19 19 3 6 9 10 9 16 
time |0.00/26.35 |26.62 0.01 695.59 36.54 |0.05 (0.05 /|11.77 
global-opt (14)/unsat|/2 |14 14 5 5 12 12 13 9 
time |0.01/8.72 8.83 0.20 0.12 41.18 |0.16  683.45)0.05 
total (168) unsat|27 |163 163 36 25 39 47 47 38 
time |0.12/89.53 |90.90 1.47 698.17 78.98 |357.41 940.89|35.18 


6 Related Work and Conclusion 


Related work. MONNIAUX and CORBINEAU [33] improved the rounding heuristic 
of HARRISON [19]. This has unfortunately no impact on the complexity of the 
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relaxation scheme. PLATZER et al. [37] compared their early versions with the 
symbolic methods based on quantifier elimination and Grobner basis. An inter- 
mediate solution is offered by MAGRON et al. [29] but only handling a restricted 
class of parametric problems. 

Branch-and-bound and interval arithmetic constitute another numerical app- 
roach to non-linear arithmetic, as implemented in the SMT solver dReal by 
GAO et al. [17,18]. These methods easily handle non-linear functions such as 
the trigonometric functions sin or cos, not yet considered in our prototype??. In 
the case of polynomial inequalities MUÑOZ and NARKAWICZ [34] offer Bernstein 
polynomials as an improvement to simple interval arithmetic. 

Finally, VSDP [20,22] is a wrapper to SDP solvers offering a similar 
method to the one of Sect. 4.2. Moreover, an implementation is also offered by 
LOFBERG [28] in the popular Matlab interface Yalmip but remains unsound, 
since all computations are performed with floating-point arithmetic, ignoring 
rounding errors. 

Using convex optimization into an SMT solver was already proposed by 
Nuzzo et al. [35,43]. However, they intentionally made their solver unsound 
in order to lean toward completeness. While this can make sense in a bounded 
model checking context, soundness is required for many applications, such as 
program verification. Moreover, this proposal was limited to convex formulas. 
Although this enables to provide models for satisfiable formulas, while only unsat 
formulas are considered in this paper, and whereas this seems a perfect choice 
for bounded model checking applications, non convex formulas are pervasive in 
applications such as program verification”. 

The use of numerical off-the-shelf solvers in SMT tools has also been studied 
in the framework of linear arithmetic [15,32]. Some comparison with state-of- 
the-art exact simplex procedures show mitigated results [14] but better results 
can be obtained by combining both approaches [25]. 


Conclusion. We presented a semi-decision procedure for non-linear polynomial 
constraints over the reals, based on numerical optimization solvers. Since these 
solvers only compute approximate solutions, a-posteriori soundness checks were 
investigated. Our first prototype implemented in the Alt-Ergo SMT solver shows 
that, although the new numerical method does not strictly outperform state-of- 
the-art symbolic methods, it enables to solve practical problems that are out of 
reach for other methods. In particular, this is demonstrated on the verification 
of functional properties of control-command programs. Such properties are of 
significant importance for critical cyber-physical systems. 

It could thus be worth studying the combination of symbolic and numerical 
methods in the hope to benefit from the best of both worlds. 


22 Polynomial approximations such as Taylor expansions should be investigated. 
23 Typically, to prove a convex loop invariant I for a loop body f, one need to prove 
I => I(f), that is =I V I(f) which is likely non convex (~I being concave). 
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Abstract. We consider the problem of approximate reduction of non-de- 
terministic automata that appear in hardware-accelerated network intru- 
sion detection systems (NIDSes). We define an error distance of a reduced 
automaton from the original one as the probability of packets being incor- 
rectly classified by the reduced automaton (wrt the probabilistic distri- 
bution of packets in the network traffic). We use this notion to design 
an approximate reduction procedure that achieves a great size reduction 
(much beyond the state-of-the-art language preserving techniques) with 
a controlled and small error. We have implemented our approach and eval- 
uated it on use cases from SNORT, a popular NIDS. Our results provide 
experimental evidence that the method can be highly efficient in practice, 
allowing NIDSes to follow the rapid growth in the speed of networks. 


1 Introduction 


The recent years have seen a boom in the number of security incidents in com- 
puter networks. In order to alleviate the impact of network attacks and intru- 
sions, Internet providers want to detect malicious traffic at their network’s entry 
points and on the backbones between sub-networks. Software-based network 
intrusion detection systems (NIDSes), such as the popular open-source system 
SNORT [1], are capable of detecting suspicious network traffic by testing (among 
others) whether a packet payload matches a regular expression (regex) describing 
known patterns of malicious traffic. NIDSes collect and maintain vast databases 
of such regexes that are typically divided into groups according to types of the 
attacks and target protocols. 

Regex matching is the most computationally demanding task of a NIDS as its 
cost grows with the speed of the network traffic as well as with the number and 
complexity of the regexes being matched. The current software-based NIDSes 
cannot perform the regex matching on networks beyond 1 Gbps [2,3], so they 
cannot handle the current speed of backbone networks ranging between tens and 
hundreds of Gbps. A promising approach to speed up NIDSes is to (partially) 
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offload regex matching into hardware [3-5]. The hardware then serves as a pre- 
filter of the network traffic, discarding the majority of the packets from further 
processing. Such pre-filtering can easily reduce the traffic the NIDS needs to 
handle by two or three orders of magnitude [3]. 

Field-programmable gate arrays (FPGAs) are the leading technology in high- 
throughput regex matching. Due to their inherent parallelism, FPGAs pro- 
vide an efficient way of implementing nondeterministic finite automata (NFAs), 
which naturally arise from the input regexes. Although the amount of avail- 
able resources in FPGAs is continually increasing, the speed of networks grows 
even faster. Working with multi-gigabit networks requires the hardware to use 
many parallel packet processing branches in a single FPGA [5]; each of them 
implementing a separate copy of the concerned NFA, and so reducing the size 
of the NFAs is of the utmost importance. Various language-preserving automata 
reduction approaches exist, mainly based on computing (bi)simulation relations 
on automata states (cf. the related work). The reductions they offer, however, 
do not satisfy the needs of high-speed hardware-accelerated NIDSes. 

Our answer to the problem is approximate reduction of NFAs, allowing for 
a trade-off between the achieved reduction and the precision of the regex match- 
ing. To formalise the intuitive notion of precision, we propose a novel probabilistic 
distance of automata. It captures the probability that a packet of the input net- 
work traffic is incorrectly accepted or rejected by the approximated NFA. The 
distance assumes a probabilistic model of the network traffic (we show later how 
such a model can be obtained). 

Having formalised the notion of precision, we specify the target of our reduc- 
tions as two variants of an optimization problem: (1) minimizing the NFA size 
given the maximum allowed error (distance from the original), or (2) minimizing 
the error given the maximum allowed NFA size. Finding such optimal approx- 
imations is, however, computationally hard (PSPACE-complete, the same as 
precise NFA minimization). 

Consequently, we sacrifice the optimality and, motivated by the typical struc- 
ture of NFAs that emerge from a set of regexes used by NIDSes (a union of many 
long “tentacles” with occasional small strongly-connected components), we limit 
the space of possible reductions by restricting the set of operations they can apply 
to the original automaton. Namely, we consider two reduction operations: (i) col- 
lapsing the future of a state into a self-loop (this reduction over-approximates 
the language), or (ii) removing states (such a reduction is under-approximating). 

The problem of identifying the optimal sets of states on which these oper- 
ations should be applied is still PSPACE-complete. The restricted problem 
is, however, more amenable to an approximation by a greedy algorithm. The 
algorithm applies the reductions state-by-state in an order determined by a pre- 
computed error labelling of the states. The process is stopped once the given 
optimization goal in terms of the size or error is reached. The labelling is based 
on the probability of packets that may be accepted through a given state and 
hence over-approximates the error that may be caused by applying the reduction 
at a given state. As our experiments show, this approach can give us high-quality 
reductions while ensuring formal error bounds. 
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Finally, it turns out that even the pre-computation of the error labelling of 
the states is costly (again PSPACE-complete). Therefore, we propose several 
ways to cheaply over-approximate it such that the strong error bound guarantees 
are still preserved. Particularly, we are able to exploit the typical structure of the 
“union of tentacles” of the hardware NFA in an algorithm that is exponential in 
the size of the largest “tentacle” only, which is indeed much faster in practice. 

We have implemented our approach and evaluated it on regexes used to 
classify malicious traffic in SNORT. We obtain quite encouraging experimental 
results demonstrating that our approach provides a much better reduction than 
language-preserving techniques with an almost negligible error. In particular, 
our experiments, going down to the level of an actual implementation of NFAs 
in FPGAs, confirm that we can squeeze into an up-to-date FPGA chip real-life 
regexes encoding malicious traffic, allowing them to be used with a negligible 
error for filtering at speeds of 100 Gbps (and even 400 Gbps). This is far beyond 
what one can achieve with current exact reduction approaches. 


Related Work. Hardware acceleration for regex matching at the line rate is 
an intensively studied technology that uses general-purpose hardware [6-14] as 
well as FPGAs [3-5,15-20]. Most of the works focus on DFA implementation 
and optimization techniques. NFAs can be exponentially smaller than DFAs but 
need, in the worst case, O(n) memory accesses to process each byte of the pay- 
load where n is the number of states. In most cases, this incurs an unacceptable 
slowdown. Several works alleviate this disadvantage of NFAs by exploiting recon- 
figurability and fine-grained parallelism of FPGAs, allowing one to process one 
character per clock cycle (e.g. [3-5,15,16,19,20]). 

In [14], which is probably the closest work to ours, the authors consider a set 
of regexes describing network attacks. They replace a potentially prohibitively 
large DFA by a tree of smaller DFAs, an alternative to using NFAs that mini- 
mizes the latency occurring in a non-F PGA-based implementation. The language 
of every DFA-node in the tree over-approximates the languages of its children. 
Packets are filtered through the tree from the root downwards until they belong 
to the language of the encountered nodes, and may be finally accepted at the 
leaves, or are rejected otherwise. The over-approximating DFAs are constructed 
using a similar notion of probability of an occurrence of a state as in our app- 
roach. The main differences from our work are that (1) the approach targets 
approximation of DFAs (not NFAs), (2) the over-approximation is based on a 
given traffic sample only (it cannot benefit from a probabilistic model), and 
(3) no probabilistic guarantees on the approximation error are provided. 

Approximation of DFAs was considered in various other contexts. Hyper-mi- 
nimization is an approach that is allowed to alter language membership of a 
finite set of words [21,22]. A DFA with a given maximum number of states is 
constructed in [23], minimizing the error defined either by (i) counting prefixes 
of misjudged words up to some length, or (ii) the sum of the probabilities of 
the misjudged words wrt the Poisson distribution over X*. Neither of these 
approaches considers reduction of NFAs nor allows to control the expected error 
with respect to the real traffic. 
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In addition to the metrics mentioned above when discussing the works 
[21-23], the following metrics should also be mentioned. The Cesaro-Jaccard 
distance studied in [24] is, in spirit, similar to [23] and does also not reflect the 
probability of individual words. The edit distance of weighted automata from 
[25] depends on the minimum edit distance between pairs of words from the two 
compared languages, again regardless of their statistical significance. None of 
these notions is suitable for our needs. 

Language-preserving minimization of NFAs is a PSPACE-complete problem 
[26,38]. More feasible (polynomial-time) is language-preserving size reduction of 
NFAs based on (bi)simulations [27-30], which does not aim for a truly minimal 
NFA. A number of advanced variants exist, based on multi-pebble or look-ahead 
simulations, or on combinations of forward and backward simulations [31-33]. 
The practical efficiency of these techniques is, however, often insufficient to allow 
them to handle the large NFAs that occur in practice and/or they do not manage 
to reduce the NFAs enough. Finally, even a minimal NFA for the given set of 
regexes is often too big to be implemented in the given FPGA operating on the 
required speed (as shown even in our experiments). Our approach is capable of 
a much better reduction for the price of a small change of the accepted language. 


2 Preliminaries 


We use (a,b) to denote the set {x € R | a < a < b} and N to denote the 
set {0,1,2,...}. Given a pair of sets X; and X2, we use X; A X2 to denote 
their symmetric difference, i.e., the set {x | Ili € {1,2} : x € X;}. We use the 
notation [v1,...,U,] to denote a vector of n elements, 1 to denote the all 1’s 
vector [1,...,1], A to denote a matrix, and A' for its transpose, and I for the 
identity matrix. 

In the following, we fix a finite non-empty alphabet X. A nondeterministic 
finite automaton (NFA) is a quadruple A = (Q,0,/, F) where Q is a finite set 
of states, ô : Q x X — 2@ is a transition function, J C Q is a set of initial 
states, and F C Q is a set of accepting states. We use Q[A], 6[A], [A], and F[A] 
to denote Q, ô, I, and F, respectively, and q S q’ to denote that q’ € 6(q,a). 
A sequence of states p = qo-+: Qn is a run of A over a word w = a,-:-an E X* 
from a state q to a state q', denoted as q ~~ q', if YL < i < n: qi q, 
do = q, and qn = gq’. Sometimes, we use p in set operations where it behaves as 
the set of states it contains. We also use q ~ q’ to denote that 3p € Q* : q L a! 
and q ~ q' to denote that Jw : q ~ q'. The language of a state q is defined as 
La(q) = {w | Jar € F : q ~ qr} and its banguage (back-language) is defined 
as L’,(q) = {w | dar € I : qr * q}. Both notions can be naturally extended 
toa set S C Q: LA(S) = Uzes Lala) and LA(S) = Uses L» (q). We drop the 
subscript A when the context is obvious. A accepts the language L(A) defined 
as L(A) = La(L). A is called deterministic (DFA) if |I| = 1 and Yq € Q and 
Va € X : |ô(q,a)| < 1, and unambiguous (UFA) if Vw € L(A): alg; € I,p € 
Q*, qr E€ F : qr ~ gr. 
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The restriction of A to S C Q is an NFA Ajs given as Ajs = (S, N (S x 
X x 25), I N S,F N S). We define the trim operation as trim(A) = Ajc where 
C = {q | Jqr € la € F : qr ~ q ~ qr}. For a set of states R C Q, we use 
reach(R) to denote the set of states reachable from R, formally, reach(R) = {r’ | 
Jre R:r~1r'}. We use the number of states as the measurement of the size 
of A, i.e., |A| = |Q]. 

A (discrete probability) distribution over a set X is a mapping Pr : X — (0,1) 
such that ` ex Pr(x) = 1. An n-state probabilistic automaton (PA) over X is 
a triple P = (a,y,{Aa}aex) where a € (0,1)” is a vector of initial weights, 
y € (0,1)” is a vector of final weights, and for every a € X, Aa € (0,1)"*” 
is a transition matrix for symbol a. We abuse notation and use Q[P] to denote 
the set of states Q[P] = {1,...,n}. Moreover, the following two properties need 
to hold: (i) }>{a[é] | ¿ € Q[P]} = 1 (the initial probability is 1) and (ii) for 
every state i € Q[P] it holds that }°{A,|?, j] | j € Q[P],a € ©} + yli] = 1 (the 
probability of accepting or leaving a state is 1). We define the support of P as 
the NFA supp(P) = (Q[P], 6[P], I[P], F[P]) s.t. 


6[P] = {(i,a, j) | Aali, j] > 0} IP] = {i| afi] > 0} FLP] = {i | ~ii] > O}. 


Let us assume that every PA P is such that supp(P) = trim(supp(P)). For 
a word w = a1 ... ap E &*, we use A, to denote the matrix Aa, +- Ag,. It can 
be easily shown that P represents a distribution over words w € X* defined as 
Prp(w) = aœ" -Au y. We call Prp(w) the probability of w in P. Given a language 
L C &*, we define the probability of L in P as Prp(L) = X uez Prp(w). 

If Conditions (i) and (ii) from the definition of PAs are dropped, we speak 
about a pseudo-probabilistic automaton (PPA), which may assign a word from 
its support a quantity that is not necessarily in the range (0,1), denoted as 
the significance of the word below. PPAs may arise during some of our operations 
performed on PAs. 


3 Approximate Reduction of NFAs 


In this section, we first introduce the key notion of our approach: a probabilis- 
tic distance of a pair of finite automata wrt a given probabilistic automaton 
that, intuitively, represents the significance of particular words. We discuss the 
complexity of computing the probabilistic distance. Finally, we formulate two 
problems of approximate automata reduction via probabilistic distance. Proofs of 
the lemmas can be found in [43]. 


3.1 Probabilistic Distance 


We start by defining our notion of a probabilistic distance of two NFAs. Assume 
NFAs A; and Ag and a probabilistic automaton P specifying the distribution 
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Prp : X* — (0,1). The probabilistic distance dp( A1, A2) between A; and Az wrt 
Prp is defined as 
dp(Ai, Az) = Pr p(L(A1) A L(Agd)). 


Intuitively, the distance captures the significance of the words accepted by one 
of the automata only. We use the distance to drive the reduction process towards 
automata with small errors and to assess the quality of the resulting automata. 

The value of Prp(L(A,) A L(A2)) can be computed as follows. Using the 
fact that (1) Lı A Lo = (Lı N Lo) W (Lə \ Lı) and (2) Li \ Lə = Lı \ (Lı N Lo), 
we get 


dp(Aı, A2) = Prp(L(A1) \ L(42)) + Prp (L(A2) \ L(41)) 
= Prp(L(A1) \ (L(41) 9 L(A2))) + Pre (L(A2) \ (L(42) A L(41))) 
= Prp(L(Ay)) + Prp(L(Ag)) — 2» Prp(L(Ax) A La). 


Hence, the key step is to compute Prp(Z(A)) for an NFA A and a PA P. Prob- 
lems similar to computing such a probability have been extensively studied in 
several contexts including verification of probabilistic systems [34-36]. The below 
lemma summarises the complexity of this step. 


Lemma 1. Let P be a PA and A an NFA. The problem of computing Prp(L(A)) 
is PSPACE-complete. For a UFA A, Prp(L(A)) can be computed in PTIME. 


In our approach, we apply the method of [36] and compute Prp(L(A)) in 
the following way. We first check whether the NFA A is unambiguous. This 
can be done by using the standard product construction (denoted as N) for 
computing the intersection of the NFA A with itself and trimming the result, 
formally B = trim(A N A), followed by a check whether there is some state 
(p,q) € Q[B] s.t. p # q [87]. If A is ambiguous, we either determinise it or 
disambiguate it [37], leading toa DFA/UFA J’, respectively. Then, we construct 
the trimmed product of A’ and P (this can be seen as computing A’ N supp(P) 
while keeping the probabilities from P on the edges of the result), yielding a PPA 
R = (a,7,{Aahacs).? Intuitively, R represents not only the words of L(A) 
but also their probability in P. Now, let A = J,es 4a be the matrix that 
expresses, for any p,q E€ Q[R], the significance of getting from p to q via any 
a € X. Further, it can be shown (cf. the proof of Lemma 1 in [43]) that the matrix 
A*, representing the significance of going from p to q via any w € X*, can be 
computed as (I — A)~1. Then, to get Prp(L(A)), it suffices to take a! - A*-¥. 
Note that, due to the determinisation/disambiguation step, the obtained value 
indeed is Prp(L(A)) despite R being a PPA. 


1 In theory, disambiguation can produce smaller automata, but, in our experiments, 
determinisation proved to work better. 

? R is not necessarily a PA since there might be transitions in P that are either 
removed or copied several times in the product construction. 
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3.2 Automata Reduction Using Probabilistic Distance 


We now exploit the above introduced probabilistic distance to formulate the task 
of approximate reduction of NFAs as the following two optimisation problems. 
Given an NFA A and a PA P specifying the distribution Prp : X* — (0,1), we 
define 


— size-driven reduction: for n € N, find an NFA A’ such that |A’| < n and 
the distance dp(A, A’) is minimal, 

— error-driven reduction: for e € (0,1), find an NFA A’ such that 
dp(A, A’) < € and the size |A’| is minimal. 


The following lemma shows that the natural decision problem underlying both 
of the above optimization problems is PSPACE-complete, which matches the 
complexity of computing the probabilistic distance as well as that of the exact 
reduction of NFAs [38]. 


Lemma 2. Consider an NFA A, a PA P, a bound on the number of states 
n € N, and an error bound e € (0,1). It is PSPACE-complete to determine 
whether there exists an NFA A’ with n states s.t. dp(A, A’) < €. 


The notions defined above do not distinguish between introducing a false 
positive (A’ accepts a word w ¢ L(A)) or a false negative (A’ does not accept 
a word w € L(A)) answers. To this end, we define over-approximating and 
under-approxzimating reductions as reductions for which the additional condi- 
tions L(A) C L(A’) and L(A) D L(A’) hold, respectively. 

A naive solution to the reductions would enumerate all NFAs A’ of sizes from 
0 up to k (resp. |A|), for each of them compute dp(A, A’), and take an automa- 
ton with the smallest probabilistic distance (resp. a smallest one satisfying the 
restriction on dp(A, A’)). Obviously, this approach is computationally infeasible. 


4 A Heuristic Approach to Approximate Reduction 


In this section, we introduce two techniques for approximate reduction of NFAs 
that avoid the need to iterate over all automata of a certain size. The first 
approach under-approximates the automata by removing states—we call it the 
pruning reduction—while the second approach over-approximates the automata 
by adding self-loops to states and removing redundant states—we call it the 
self-loop reduction. Finding an optimal automaton using these reductions is also 
PSPACE-complete, but more amenable to heuristics like greedy algorithms. We 
start with introducing two high-level greedy algorithms, one for the size- and one 
for the error-driven reduction, and follow by showing their instantiations for the 
pruning and the self-loop reduction. A crucial role in the algorithms is played 
by a function that labels states of the automata by an estimate of the error that 
will be caused when some of the reductions is applied at a given state. 
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4.1 A General Algorithm for Size-Driven Reduction 


Algorithm1 shows a general 
greedy method for perform- 
ing the size-driven reduc- 
tion. In order to use the 
same high-level algorithm in Ve 
both directions of reduction 
(over/under-approximating), it 
is parameterized with three 
functions: label, reduce, and 
error. The real intricacy of 
the procedure is hidden inside 
these three functions. Intuitively, label( A, P) assigns every state of an NFA A an 
approximation of the error that will be caused wrt the PA P when a reduction 
is applied at this state, while the purpose of reduce(A,V) is to create a new 
NFA A’ obtained from A by introducing some error at states from V.’ Fur- 
ther, error(A, V, label(A, P)) estimates the error introduced by the application 
of reduce(A, V), possibly in a more precise (and costly) way than by just sum- 
ming the concerned error labels: Such a computation is possible outside of the 
main computation loop. We show instantiations of these functions later, when 
discussing the reductions used. Moreover, the algorithm is also parameterized 
with a total order <4 jabe1(A,p) that defines which states of A are processed first 
and which are processed later. The ordering may take into account the precom- 
puted labelling. The algorithm accepts an NFA A, a PA P, and n € N and 
outputs a pair consisting of an NFA A’ of the size |A’| < n and an error bound 
e such that dp(A, A’) < e. 

The main idea of the algorithm is that it creates a set V of states where 
an error is to be introduced. V is constructed by starting from an empty set 
and adding states to it in the order given by <A, jabei(a,p), until the size of the 
result of reduce(A,V) has reached the desired bound n (in our setting, reduce is 
always antitone, i.e., for V C V’, it holds that |reduce(A, V)| > |reduce(A, V’)|). 
We now define the necessary condition for label, reduce, and error that makes 
Algorithm 1 correct. 


Condition C1 holds if for every NFA A, PA P, and a set V C Q[A], we have 
that (a) error(A, V, label(A, P)) > dp(A, reduce(A,V)), (b) |reduce(A, Q[A])| < 
1, and (c) reduce(A,0) = A. 


Cl1(a) ensures that the error computed by the reduction algorithm indeed 
over-approximates the exact probabilistic distance, C1(b) ensures that the algo- 
rithm can (in the worst case, by applying the reduction at every state of A) for 
any n > 1 output a result |A’| of the size |A’| < n, and C1(c) ensures that when 
no error is to be introduced at any state, we obtain the original automaton. 


Lemma 3. Algorithm 1 is correct if C1 holds. 


Algorithm 1. A greedy size-driven reduction 
Input :NFA A= (Q,ô,I, F), PA P,n>1 
Output: NFA A, € E€ R s.t. |A| < n and 

dp(A, A’) <€ 

for q E€ Q in the order <4 jabei(A,P) AO 

V — VU {q}; A’ — reduce(A, V); 
if |A'| < n then break ; 

return A’, € = error(A, V, label(A, P)); 


a A WON 


3 We emphasize that this does not mean that states from V will be simply removed 
from A—the performed operation depends on the particular reduction. 
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4.2 A General Algorithm for Error-Driven Reduction 


In Algorithm 2, we pro- 
vide a high-level method of 
computing the error-driven 
reduction. The algorithm is 
in many ways similar to 
Algorithm 1; It also com- 
putes a set of states V 
where an error is to be 
introduced, but an impor- 
tant difference is that we 
compute an approximation 
of the error in each step and only add q to V if it does not raise the error over 
the threshold e. Note that the error does not need to be monotone, so it may 
be advantageous to traverse all states from Q and not terminate as soon as the 
threshold is reached. The correctness of Algorithm 2 also depends on C1. 


Algorithm 2. A greedy error-driven reduction. 
Input :NFA A= (Q,ô,I, F), PA P, e€ (0,1) 
Output: NFA A’ s.t. dp(A, A’) < € 
L — label(A, P); 
V Í; 
for q € Q in the order <4 labei(4,Pp) dO 
e — error(A,V U {q}, 4); 
if e < ethen V<VU {qd}; 
return A’ = reduce( A, V); 


a ak ONB 


Lemma 4. Algorithm 2 is correct if C1 holds. 


4.3 Pruning Reduction 


The pruning reduction is based on identifying a set of states to be removed 
from an NFA A, under-approximating the language of A. In particular, for A = 
(Q, 6,1, F), the pruning reduction finds a set R C Q and restricts A to Q \ R, 
followed by removing useless states, to construct a reduced automaton A’ = 
trim(Ajg\r)- Note that the natural decision problem corresponding to this 
reduction is also PSPACE-complete. 


Lemma 5. Consider an NFA A, a PA P, a bound on the number of states 
n € N, and an error bound e € (0,1). It is PSPACE-complete to determine 
whether there exists a subset of states R C QA] of the size |R| = n such that 
dp(A, Ajr) < €. 


Although Lemma 5 shows that the pruning reduction is as hard as a general 
reduction (cf. Lemma 2), the pruning reduction is more amenable to the use 
of heuristics like the greedy algorithms from Sects. 4.1 and 4.2. We instantiate 
reduce, error, and label in these high-level algorithms in the following way (the 
subscript p means pruning): 

reduce,(A,V) = trim(Aig\v), errorp(A, V, £) = as DRUI {f(q) |qE V’}, 
where |V |, is defined as follows. Because of the use of trim in reduce,, for 
a pair of sets V,V’ s.t. V C V’, it holds that reduce,(A,V) may, in general, 
yield the same automaton as reduce,(A,V’). Hence, we define a partial order 
E, on 22 as Vy Cp V2 iff reduce,(A,Vi) = reduce,(A,V2) and Vi C V2, and 
use |V |p to denote the set of minimal elements wrt V and Ep. The value of the 
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approximation error,(A, V, £) is therefore the minimum of the sum of errors over 
all sets from |V |p. 

Note that the size of |V], can again be exponential, and thus we employ 
a greedy approach for guessing an optimal V’. Clearly, this cannot affect the 
soundness of the algorithm, but only decreases the precision of the bound on 
the distance. Our experiments indicate that for automata appearing in NIDSes, 
this simplification has typically only a negligible impact on the precision of the 
bounds. 

For computing the state labelling, we provide the following three functions, 
which differ in the precision they provide and the difficulty of their computation 
(naturally, more precise labellings are harder to compute): label}, label, and 
label’. Given an NFA A and a PA P, they generate the labellings 4, 4, and £3, 
respectively, defined as 


OQ) = F {Pretala | a! € reach({a}) VF}, 


(q) =Prp AG n reach(q))) ,  @(q) =Prp CAOZ) l 


A state label (q) approximates the error of the words removed from L(A) 
when q is removed. More concretely, Ala) is a rough estimate saying that the 
error can be bounded by the sum of probabilities of the banguages of all final 
states reachable from q (in the worst case, all those final states might become 
unreachable). Note that Ala) (1) counts the error of a word accepted in two 
different final states of reach(q) twice, and (2) also considers words that are 
accepted in some final state in reach(q) without going through q. The labelling 4 
deals with (1) by computing the total probability of the banguage of the set of 
all final states reachable from q, and the labelling B in addition also deals with 
(2) by only considering words that traverse through q (they can still be accepted 
in some final state not in reach(q) though, so even £ is still imprecise). Note 
that if A is unambiguous then Ue — a. 

When computing the label of q, we first modify A to obtain A’ accepting 
the language related to the particular labelling. Then, we compute the value of 
Prp(L(A’)) using the algorithm from Sect. 3.1. Recall that this step is in general 
costly, due to the determinisation/disambiguation of A’. The key property of 
the labelling computation resides in the fact that if A is composed of several 
disjoint sub-automata, the automaton A’ is typically much smaller than A and 
thus the computation of the label is considerable less demanding. Since the 
automata appearing in regex matching for NIDS are composed of the union of 
“tentacles”, the particular A’s are very small, which enables efficient component- 
wise computation of the labels. 

The following lemma states the correctness of using the pruning reduction 
as af inetantiatign of Algorithms 1 and 2 and also the relation among 44, 3, 
and £7. 
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Lemma 6. For every x € {1,2,3}, the functions reducep, errory, and label; 
satisfy C1. Moreover, consider an NFA A, a PA P, and let & = label? p(4, P) 
for x € {1,2,3}. Then, for each q € Q[A], we have ilq ) > (q i > Bq ). 


4.4 Self-loop Reduction 


The main idea of the self-loop reduction is to over-approximate the language 
of A by adding self-loops over every symbol at selected states. This makes some 
states of A redundant, allowing them to be removed without introducing any 
more error. Given an NFA A = (Q,0,/, F), the self-loop reduction searches for 
a set of states R C Q, which will have self-loops added, and removes other 
transitions leading out of these states, making some states unreachable. The 
unreachable states are then removed. 

Formally, let sl(A, R) be the NFA (Q, 6’, I, F) whose transition function 6’ 
is defined, for all p € Q anda € X, as (p,a) = {p} if p € R and 6'(p,a) = 
6(p,a) otherwise. As with the pruning reduction, the natural decision problem 
corresponding to the self-loop reduction is also PSPACE-complete. 


Lemma 7. Consider an NFA A, a PA P, a bound on the number of states 
n € N, and an error bound e € (0,1). It is PSPACE-complete to determine 
whether there exists a subset of states R C Q[A] of the size |R| = n such that 
dp(A, sl(A, R)) < € 


The required functions in the error- and size-driven reduction algorithms are 
instantiated in the following way (the subcript sl means self-loop): 


reducesi(A,V) = trim(sl(A,V)), — errorsi(A, V, £) = ¥ {4 {&(q) |q E€ min(|V]s)}, 


where |V | s; is defined in a similar manner as |V |, in the previous section (using 
a partial order Es defined similarly to Ep; in this case, the order Eg; has a single 
minimal element, though). 

The frictions label}, label2,, and label®, compute the state labellings ¢! 
and £3, for an NFA A and a PA P defined as follows: 


eZ 


slo“ sl) 


Cla) = weight p(L's(q)), Bla) = Prp (L4(@)-E"), 
Bl) = Bl) -Pre (L4(@)-La(a)) . 


Above, weight p(w) for a PA P = (a,7,{Aa}aex) and a word w € X* is 
defined as weight p(w) =a! - Aw: 1 (ie., similarly as Prp(w) but with the final 
weights y discarded), and weight p(L) for L C X* is defined as weight p(L) = 
ewer weight p(w). 

Intuitively, the state labelling ¢1)(q¢) computes the probability that q is 
reached from an initial state, so if q is pumped up with all possible word end- 
ings, this is the maximum possible error introduced by the added word endings. 
This has the following sources of imprecision: (1) the probability of some words 
may be included twice, e.g., when L% (q) = {a, ab}, the probabilities of all words 
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from {ab}.* are included twice in ¢1)(q) because {ab}. X* C {a}.¥*, and (2) 
41 (q) can also contain probabilities of words that are already accepted on a run 
traversing q. The state labelling (2, deals with (1) by considering the probability 
of the language L,(q).5*, and ¢3, deals also with (2) by subtracting from the 
result of ¢2, the probabilities of the words that pass through q and are accepted. 

The computation of the state labellings for the self-loop reduction is done in 
a similar way as the computation of the state labellings for the pruning reduction 
(cf. Sect. 4.3). For a computation of weight p(L) one can use the same algorithm 
as for Prp(Z), only the final vector for PA P is set to 1. The correctness of 
Algorithms 1 and 2 when instantiated using the self-loop reduction is stated in 
the following lemma. 


Lemma 8. For every x € {1,2,3}, the functions reduces, errors,, and label% 
satisfy C1. Moreover, consider an NFA A, a PA P, and let €%, = label§,(A, P) 
for x € {1,2,3}. Then, for each q € Q[A], we have (q) > Êo = la). 


5 Reduction of NFAs in Network Intrusion Detection 
Systems 


We have implemented our approach in a Python prototype named APPREAL 
(APProximate REduction of Automata and Languages) and evaluated it on the 
use case of network intrusion detection using SNORT [1], a popular open source 
NIDS. The version of APPREAL used for the evaluation in the current paper is 
available as an artifact [44] for the TACAS’18 artifact virtual machine [45]. 


5.1 Network Traffic Model 


The reduction we describe in this paper is driven by a probabilistic model repre- 
senting a distribution over X*, and the formal guarantees are also wrt this model. 
We use learning to obtain a model of network traffic over the 8-bit ASCII alpha- 
bet at a given network point. Our model is created from several gigabytes of 
network traffic from a measuring point of the CESNET Internet provider con- 
nected to a 100 Gbps backbone link (unfortunately, we cannot provide the traffic 
dump since it may contain sensitive data). 

Learning a PA representing the network traffic faithfully is hard. The PA 
cannot be too specific—although the number of different packets that can occur 
is finite, it is still extremely large (a conservative estimate assuming the most 
common scenario Ethernet /IPv4/TCP would still yield a number over 210-000), 
If we assigned non-zero probabilities only to the packets from the dump (which 
are less than 27°), the obtained model would completely ignore virtually all 
packets that might appear on the network, and, moreover, the model would also 
be very large (millions of states), making it difficult to use in our algorithms. 
A generalization of the obtained traffic is therefore needed. 


4 https: //github.com/vhavlena/appreal/tree/tacas18. 


Approximate Reduction of Finite Automata 167 


A natural solution is to exploit results from the area of PA learning, such 
as [39,40]. Indeed, we experimented with the use of ALERGIA [39], a learning 
algorithm that constructs a PA from a prefix tree (where edges are labelled 
with multiplicities) by merging nodes that are “similar.” The automata that we 
obtained were, however, too general. In particular, the constructed automata 
destroyed the structure of network protocols—the merging was too permissive 
and the generalization merged distant states, which introduced loops over a very 
large substructure in the automaton (such a case usually does not correspond 
to the design of network protocols). As a result, the obtained PA more or less 
represented the Poisson distribution, having essentially no value for us. 

In Sect. 5.2, we focus on the detection of malicious traffic transmitted over 
HTTP. We take advantage of this fact and create a PA representing the traffic 
while taking into account the structure of HTTP. We start by manually creating 
a DFA that represents the high-level structure of HTTP. Then, we proceed by 
feeding 34,191 HTTP packets from our sample into the DFA, at the same time 
taking notes about how many times every state is reached and how many times 
every transition is taken. The resulting PA Pyrrp (of 52 states) is then obtained 
from the DFA and the labels in the obvious way. 

The described method yields automata that are much better than those 
obtained using ALERGIA in our experiments. A disadvantage of the method 
is that it is only semi-automatic—the basic DFA needed to be provided by an 
expert. We have yet to find an algorithm that would suit our needs for learning 
more general network traffic. 


5.2 Evaluation 


We start this section by introducing the experimental setting, namely, the inte- 
gration of our reduction techniques into the tool chain implementing efficient 
regex matching, the concrete settings of APPREAL, and the evaluation environ- 
ment. Afterwards, we discuss the results evaluating the quality of the obtained 
approximate reductions as well as of the provided error bounds. Finally, we 
present the performance of our approach and discuss its key aspects. Due to 
the lack of space, we selected the most interesting results demonstrating the 
potential as well as the limitations of our approach. 


General Setting. SNORT detects malicious network traffic based on rules that 
contain conditions. The conditions may take into consideration, among others, 
network addresses, ports, or Perl compatible regular expressions (PCREs) that 
the packet payload should match. In our evaluation, we always select a sub- 
set of SNORT rules, extract the PCREs from them, and use NETBENCH [20] to 
transform them into a single NFA A. Before applying APPREAL, we use the state- 
of-the-art NFA reduction tool REDUCE [41] to decrease the size of A. REDUCE 
performs a language-preserving reduction of A using advanced variants of sim- 
ulation [31] (in the experiment reported in Table 3, we skip the use of REDUCE 
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at this step as discussed in the performance evaluation). The automaton AR"? 
obtained as the result of REDUCE is the input of APPREAL, which performs one 
of the approximate reductions from Sect. 4 wrt the traffic model Pyrrp, yield- 
ing A^PP, After the approximate reduction, we, one more time, use REDUCE and 
obtain the result A’. 


Settings of APPREAL. In the use case of NIDS pre-filtering, it may be impor- 
tant to never introduce a false negative, i.e., to never drop a malicious packet. 
Therefore, we focus our evaluation on the self-loop reduction (Sect. 4.4). In partic- 
ular, we use the state labelling function label2,, since it provides a good trade-off 
between the precision and the computational demands (recall that the compu- 
tation of label? can exploit the “tentacle” structure of the NFAs we work with). 
We give more attention to the size-driven reduction (Sect. 4.1) since, in our set- 
ting, a bound on the available FPGA resources is typically given and the task is 
to create an NFA with the smallest error that fits inside. The order < Ae2, Over 


states used in Sects. 4.1 and 4.2 is defined as s <42 s’ & Ls) < &,(s’). 


Evaluation Environment. All experiments run on a 64-bit LINUX DEBIAN 
workstation with the Intel Core(TM) i5-661 CPU running at 3.33GHz with 
16 GiB of RAM. 


Description of Tables. In the caption of every table, we provide the name of 
the input file (in the directory regexps/tacas18/ of the repository of APPREAL) 
with the selection of SNORT regexes used in the particular experiment, together 
with the type of the reduction (size- or error-driven). All reductions are over- 
approximating (self-loop reduction). We further provide the size of the input 
automaton | A], the size after the initial processing by REDUCE (|AR?|), and the 
time of this reduction (time(REDUCE)). Finally, we list the times of computing 
the state labelling label2, on A®®” (time(label2))), the exact probabilistic distance 
(time(Exact)), and also the number of look-up tables (LUTs(A®®°)) consumed 
on the targeted FPGA (Xilinx Virtex 7 H580T) when AR® was synthesized 
(more on this in Sect.5.3). The meaning of the columns in the tables is the 
following: 


k/e is the parameter of the reduction. In particular, k is used for the size-driven 
reduction and denotes the desired reduction ration k = [ary] for an input 


NFA AR and the desired size of the output n. On the other hand, e is the 
desired maximum error on the output for the error-driven reduction. 

|A“?”| shows the number of states of the automaton A^? after the reduction 
by APPREAL and the time the reduction took (we omit it when it is not 
interesting). 

|A’| contains the number of states of the NFA A’ obtained after applying 
REDUCE on A^ and the time used by REDUCE at this step (omitted when 
not interesting). 
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Table 1. Results for the http-malicious regex, |Ana| = 249, |AREP| = 98, 
time(REDUCE) = 3.58, time(label2,) = 38.78, time(Exact) = 3.8-6.5s, and 
R 
LUTs(Aga) = 382. 
(a) size-driven reduction (b) error-driven reduction 
Error Exact Traffic Error Exact Traffic 
k [Aor |A| bound error error LUTs G Aŝ] |A‘.,| bound error error 
0.1] 9(0.65s) 9(0.4s) 0.0704 0.0704 0.0685 — 0.08 3 3 0.0724 0.0724 0.0720 
0.2 | 19 (0.66s) 19(0.5s) 0.0677 0.0677 0.0648 — 0.07 4 4 0.0700 0.0700 0.0683 
0.3 | 29 (0.69s) 26(0.9s) 0.0279 0.0278 0.0598 154 0.04 35 32 0.0267 0.0212 0.0036 
0.4 | 39 (0.68s) 36(1.1s) 0.0032 0.0032 0.0008 — 0.02 36 33 0.0105 0.0096 0.0032 
0.5 | 49 (0.68s) 44(1.4s) 2.8e-05 2.8e-05 4.1e-06 — 0.001 41 38 0.0005 0.0005 0.0003 
0.6 | 58 (0.69s) 49(1.7s) 8.7e-08 8.7e-08 0.0 224 le-04 47 41 7.7e-05 7.7e-05  1.2e-05 
0.8 | 78 (0.69s) 75 (2.7s) 2.4e-17 2.4e-17 0.0 297 le-05 51 47 6.6e-06 6.6e-06 0.0 


Error bound shows the estimation of the error of A’ as determined by the 
reduction itself, i.e., it is the probabilistic distance computed by the function 
error in Sect. 4. 

Exact error contains the values of dp,,.,,(A,A’) that we computed after 
the reduction in order to evaluate the precision of the result given in 
Error bound. The computation of this value is very expensive (time(Fxact)) 
since it inherently requires determinisation of the whole automaton A. We do 
not provide it in Table3 (presenting the results for the automaton Apa with 
1,352 states) because the determinisation ran out of memory (the step is not 
required in the reduction process). 

Traffic error shows the error that we obtained when compared A’ with A on 
an HTTP traffic sample, in particular the ratio of packets misclassified by A’ 
to the total number of packets in the sample (242,468). Comparing Exact 
error with Traffic error gives us a feedback about the fidelity of the traffic 
model Pyrrp. We note that there are no guarantees on the relationship 
between Exact error and Traffic error. 

LUTs is the number of LUTs consumed by A’ when synthesized into the FPGA. 
Hardware synthesis is a costly step so we provide this value only for selected 
NFAs. 


Approximation Errors 

Table 1 presents the results of the self-loop reduction for the NFA Ana) describing 
http-malicious regexes. We can observe that the differences between the upper 
bounds on the probabilistic distance and its real value are negligible (typically 
in the order of 1074 or less). We can also see that the probabilistic distance 
agrees with the traffic error. This indicates a good quality of the traffic model 
employed in the reduction process. Further, we can see that our approach can 
provide useful trade-offs between the reduction error and the reduction factor. 
Finally, Table 1 shows that a significant reduction is obtained when the error 
threshold e is increased from 0.04 to 0.07. 
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Table2 presents the results of Table 2. Results for the http-attacks regex, 


the size-driven self-loop reduction i2e-driven reduction, |Aast| = 142, [Ase | = 


for NFA Axs describing http- 112, time(REDUCE) = 7.98, time (labels) = 
attacks regexes. We can observe 28.3 min, time(Exact) = 14.0-16.4 min. 
that the error bounds provide k ||Ax |Ala| |Error [Exact | Traffic 


. x : bound [error error 
again a very good approximation 


Pe eee 0.1 1s) 5 (0.4 s)|1.0 0.9972 |0.9957 
of the real probabilistic distance. 02 18) 14 (0.6 s) 1.0 Gaaal 0213 
On the other hand, the differ- 93 .1 s) 24 (0.7 s) 0.081 10.0770 (0.0067 


.1 s) 37 (1.6 s)|0.0005 [0.0005 [0.0010 
.1 s)|49 (1.2 s)|3.3e-06 |3.3e-06 |0.0010 
.1 s)|61 (1.9 s)|1.2e-09 |1.2e-09 |8.7e-05 
.1 s)|72 (2.4 s)|4.8e-12 |4.8e-12 |1.2e-05 
.1 s)|93 (4.7 s)|3.7e-16 |1.1e-15 |0.0 


tance and the traffic error is larger 0.5 
0.6 
0.7 
0.9 100 


11 
22 
33 
ence between the probabilistic dis- 0.4 44 
56 
67 
78 


than for Apai. Since all exper- 
iments use the same probabilis- 
tic automaton and the same traf- 
fic, this discrepancy is accounted to the different set of packets that 
are incorrectly accepted by AREP, If the probability of these packets is 
adequately captured in the traffic model, the difference between the distance and 
the traffic error is small and vice versa. This also explains an even larger differ- 
ence in Table3 (presenting the results for Apa constructed from http-backdoor 
regexes) for k € (0.2,0.4). Here, the traffic error is very small and caused by a 
small set of packets (approx. 70), whose probability is not correctly captured in 
the traffic model. Despite this problem, the results clearly show that our app- 
roach still provides significant reductions while keeping the traffic error small: 
about a 5-fold reduction is obtained for the traffic error 0.03% and a 10-fold 
reduction is obtained for the traffic error 6.3%. We discuss the practical impact 
of such a reduction in Sect. 5.3. 


( 
( 
( 
( 
( 
( 
( 
( 


Performance of the Approximate Reduction 


In all our experiments (Tables1, Table 3. Results for http-backdoor, size- 
2 and 3), we can observe that driven reduction, |Apa| = 1,352, time(label?,) = 
the most time-consuming step of 19.9min, LUTs(ABRP?) = 2, 266. 
the reduction process is the com- k Aw? A; a IACUE 
putation of state labellings (it bound error 
takes at least 90% of the total 0.1| 135 (1.2m)| 8 (2.6s) |1.0 0.997 | 202 
time). The crucial observation is 9-2 270 (1.2m)|111 (5.2s) |0.0012 0.0631 | 579 
that the structure of the NFAs &3| 405 (2m) |233 (9.8s) |3.4e-08 0.0003 | 894 
0.4 540 (1.3m) |351 (21.7s)|1.0e-12 0.0003 |1063 
fundamentally affects the per- j 
( ) 
( ) 


; 0.5) 676 (1.3m) |473 (41.85)|1.2e-17 0.0 |1249 
formance of this step. Although 9.7) 946 (1.4m)|739 (2.1m) |8.3e-30 0.0 1735 


after REDUCE, the size of Amal iS 0.9 1216 (1.5m) |983 (5.6m) |1.3e-52 0.0 2033 
very similar to the size of Aatt, 

computing label2, takes more time (28.3 min vs. 38.7s). The key reason behind 
this slowdown is the determinisation (or alternatively disambiguation) process 
required by the product construction underlying the state labelling computation 
(cf. Sect. 4.4). For Aart, the process results in a significantly larger product when 
compared to the product for Apai. The size of the product directly determines 
the time and space complexity of solving the linear equation system required for 
computing the state labelling. 
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As explained in Sect. 4, the computation of the state labelling label2, can 
exploit the “tentacle” structure of the NFAs appearing in NIDSes and thus can 
be done component-wise. On the other hand, our experiments reveal that the use 
of REDUCE typically breaks this structure and thus the component-wise compu- 
tation cannot be effectively used. For the NFA Ajai, this behaviour does not have 
any major performance impact as the determinisation leads to a moderate-sized 
automaton and the state labelling computation takes less than 40s. On the other 
hand, this behaviour has a dramatic effect for the NFA Aate. By disabling the 
initial application of REDUCE and thus preserving the original structure of Aatt, 
we were able to speed up the state label computation from 28.3 min to 1.5 min. 
Note that other steps of the approximate reduction took a similar time as before 
disabling REDUCE and also that the trade-offs between the error and the reduc- 
tion factor were similar. Surprisingly, disabling REDUCE caused that the com- 
putation of the exact probabilistic distance became computationally infeasible 
because the determinisation ran out of memory. 

Due to the size of the NFA Apa, the impact of disabling the initial applica- 
tion of REDUCE is even more fundamental. In particular, computing the state 
labelling took only 19.9 min, in contrast to running out of memory when the 
REDUCE is applied in the first step (therefore, the input automaton is not pro- 
cessed by REDUCE in Table 3; we still give the number of LUTs of its reduced 
version for comparison, though). Note that the size of Apa also slows down other 
reduction steps (the greedy algorithm and the final REDUCE reduction). We 
can, however, clearly see that computing the state labelling is still the most 
time-consuming step. 


5.3 The Real Impact in an FPGA-Accelerated NIDS 


Further, we also evaluated some of the obtained automata in the setting of [5] 
implementing a high-speed NIDS pre-filter. In that setting, the amount of 
resources available for the regex matching engine is 15,000 LUTs? and the fre- 
quency of the engine is 200 MHz. We synthesized NFAs that use a 32-bit-wide 
data path, corresponding to processing 4 ASCII characters at once, which is— 
according to the analysis in [5]—the best trade-off between the utilization of 
the chip resources and the maximum achievable frequency. A simple analysis 
shows that the throughput of one automaton is 6.4 Gbps, so in order to reach 
the desired link speed of 100 Gbps, 16 units are required, and 63 units are needed 
to handle 400 Gbps. With the given amount of LUTs, we are therefore bounded 
by 937 LUTs for 100 Gbps and 238 LUTs for 400 Gbps. 

We focused on the consumption of LUTs by an implementation of the regex 
matching engines for http-backdoor (ABP?) and http-malicious (AREP), 


mal 


— 100 Gbps: For this speed, AREP can be used without any approximate reduc- 


mal 
tion as it is small enough to fit in the available space. On the other hand, ARF? 


5 We omit the analysis of flip-flop consumption because in our setting it is dominated 
by the LUT consumption. 
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without the approximate reduction is way too large to fit (at most 6 units fit 
inside the available space, yielding the throughput of only 38.4 Gbps, which 
is unacceptable). The column LUTs in Table3 shows that using our frame- 
work, we are able to reduce ARF? such that it uses 894 LUTs (for k = 0.3), 
and so all the needed 16 units fit into the FPGA, yielding the throughput 
over 100 Gbps and the theoretical error bound of a false positive < 3.4 x 1078 
wrt the model Purp. 

— 400 Gbps: Regex matching at this speed is extremely challenging. The only 
reduced version of ARE? that fits in the available space is the one for the value 
k = 0.1 with the error bound almost 1. The situation is better for AREP, In the 
exact version, at most 39 units can fit inside the FPGA with the maximum 
throughput of 249.6 Gbps. On the other hand, when using our approximate 
reduction framework, we are able to place 63 units into the FPGA, each 
of the size 224 LUTs (k = 0.6) with the throughput over 400 Gbps and the 
theoretical error bound of a false positive < 8.7 x 1078 wrt the model Py rrp. 


6 Conclusion 


We have proposed a novel approach for approximate reduction of NFAs used in 
network traffic filtering. Our approach is based on a proposal of a probabilistic 
distance of the original and reduced automaton using a probabilistic model of the 
input network traffic, which characterizes the significance of particular packets. 
We characterized the computational complexity of approximate reductions based 
on the described distance and proposed a sequence of heuristics allowing one to 
perform the approximate reduction in an efficient way. Our experimental results 
are quite encouraging and show that we can often achieve a very significant 
reduction for a negligible loss of precision. We showed that using our approach, 
FPGA-accelerated network filtering on large traffic speeds can be applied on 
regexes of malicious traffic where it could not be applied before. 

In the future, we plan to investigate other approximate reductions of the 
NFAs, maybe using some variant of abstraction from abstract regular model 
checking [42], adapted for the given probabilistic setting. Another important 
issue for the future is to develop better ways of learning a suitable probabilistic 
model of the input traffic. 
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Abstract. Automated synthesis of reactive systems from specifications 
has been a topic of research for decades. Recently, a variety of approaches 
have been proposed to extend synthesis of reactive systems from propo- 
sitional specifications towards specifications over rich theories. We pro- 
pose a novel, completely automated approach to program synthesis which 
reduces the problem to deciding the validity of a set of Va-formulas. In 
spirit of IC3/PDR, our problem space is recursively refined by bloc- 
king out regions of unsafe states, aiming to discover a fixpoint that 
describes safe reactions. If such a fixpoint is found, we construct a witness 
that is directly translated into an implementation. We implemented the 
algorithm on top of the JKIND model checker, and exercised it against 
contracts written using the Lustre specification language. Experimental 
results show how the new algorithm outperforms JKIND’s already exist- 
ing synthesis procedure based on k-induction and addresses soundness 
issues in the k-inductive approach with respect to unrealizable results. 


1 Introduction 


Program synthesis is one of the most challenging problems in computer science. 
The objective is to define a process to automatically derive implementations that 
are guaranteed to comply with specifications expressed in the form of logic for- 
mulas. The problem has seen increased popularity in the recent years, mainly due 
to the capabilities of modern symbolic solvers, including Satisfiability Modulo 
Theories (SMT) [1] tools, to compute compact and precise regions that describe 
© The Author(s) 2018 
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under which conditions an implementation exists for the given specification [25]. 
As aresult, the problem has been well-studied for the area of propositional speci- 
fications (see Gulwani [15] for a survey), and approaches have been proposed 
to tackle challenges involving richer specifications. Template-based techniques 
focus on synthesizing programs that match a certain shape (the template) [28], 
while inductive synthesis uses the idea of refining the problem space using coun- 
terexamples, to converge to a solution [12]. A different category is that of func- 
tional synthesis, in which the goal is to construct functions from pre-defined 
input/output relations [22]. 

Our goal is to effectively synthesize programs from safety specifications writ- 
ten in the Lustre [18] language. These specifications are structured in the form 
of Assume-Guarantee contracts, similarly to approaches in Linear Temporal 
Logic [11]. In prior work, we developed a solution to the synthesis problem which 
is based on k-induction [14,19,21]. Despite showing good results, the approach 
suffers from soundness problems with respect to unrealizable results; a contract 
could be declared as unrealizable, while an actual implementation exists. In 
this work, we propose a novel approach that is a direct improvement over the 
k-inductive method in two important aspects: performance and generality. On 
all models that can be synthesized by k-induction, the new algorithm always 
outperforms in terms of synthesis time while yielding roughly approximate code 
sizes and execution times for the generated code. More importantly, the new 
algorithm can synthesize a strictly larger set of benchmark models, and comes 
with an improved termination guarantee: unlike in k-induction, if the algorithm 
terminates with an “unrealizable” result, then there is no possible realization of 
the contract. 

The technique has been used to synthesize contracts involving linear real and 
integer arithmetic (LIRA), but remains generic enough to be extended into sup- 
porting additional theories in the future, as well as to liveness properties that can 
be reduced to safety properties (as in k-liveness [7]). Our approach is completely 
automated and requires no guidance to the tools in terms of user interaction 
(unlike [26,27]), and it is capable of providing solutions without requiring any 
templates, as in e.g., work by Beyene et al. [2]. We were able to automatically 
solve problems that were “hard” and required hand-written templates specialized 
to the problem in [2]. 

The main idea of the algorithm was inspired by induction-based model check- 
ing, and in particular by IC3/Property Directed Reachability (PDR) [4,9]. In 
PDR, the goal is to discover an inductive invariant for a property, by recursively 
blocking generalized regions describing unsafe states. Similarly, we attempt to 
reach a greatest fixpoint that contains states that react to arbitrary environment 
behavior and lead to states within the fixpoint that comply with all guarantees. 
Formally, the greatest fixpoint is sufficient to prove the validity of a Va-formula, 
which states that for any state and environment input, there exists a system reac- 
tion that complies with the specification. Starting from the entire problem space, 
we recursively block regions of states that violate the contract, using regions of 
validity that are generated by invalid Va-formulas. If the refined Va-formula 
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is valid, we reach a fixpoint which can effectively be used by the specified tran- 
sition relation to provide safe reactions to environment inputs. We then extract 
a witness for the formula’s satisfiability, which can be directly transformed into 
the language intended for the system’s implementation. 

The algorithm was implemented as a feature in the JKIND model checker and 
is based on the general concept of extracting a witness that satisfies a Va-formula, 
using the AE-VAL Skolemizer [10, 19]. While AE-VAL was mainly used as a tool 
for solving queries and extracting Skolems in our k-inductive approach, in this 
paper we also take advantage of its capability to generate regions of validity from 
invalid formulas to reach a fixpoint of satisfiable assignments to state variables. 

The contributions of the paper are therefore: 


e A novel approach to synthesis of contracts involving rich theories that is 
efficient, general, and completely automated (no reliance on templates or 
user guidance), 

e an implementation of the approach in a branch of the JKIND model checker, 
and 

e an experiment over a large suite of benchmark models demonstrating the 
effectiveness of the approach. 


The rest of the paper is organized as follows. Section 2 briefly describes the 
Cinderella-Stepmother problem that we use as an example throughout the paper. 
In Sect. 3, we provide the necessary formal definitions to describe the synthesis 
algorithm, which is presented then in Sect. 4. We present an evaluation in Sect. 5 
and comparison against a method based on k-induction that exists using the 
same input language. Finally, we discuss the differences of our work with closely 
related ideas in Sect. 6 and conclude in Sect. 7. 


2 Overview: The Cinderella-Stepmother Game 


We illustrate the flow of the validity guided-synthesis algorithm using a variation 
of the minimum-backlog problem, the two player game between Cinderella and 
her wicked Stepmother, first expressed by Bodlaender et al. [3]. 

The main objective for Cinderella (i.e. the reactive system) is to prevent a col- 
lection of buckets from overflowing with water. On the other hand, Cinderella’s 
Stepmother (i.e. the system’s environment) refills the buckets with a predefined 
amount of water that is distributed in a random fashion between the buckets. For 
the running example, we chose an instance of the game that has been previously 
used in template-based synthesis [2]. In this instance, the game is described using 
five buckets, where each bucket can contain up to two units of water. Cinderella 
has the option to empty two adjacent buckets at each of her turns, while the 
Stepmother distributes one unit of water over all five buckets. In the context of 
this paper we use this example to show how specification is expressed, as well as 
how we can synthesize an efficient implementation that describes reactions for 
Cinderella, such that a bucket overflow is always prevented. 
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Assumptions Implementation Guarantees 
l (Cinderella) < by = 0 (initially) 
o f z Uk by, € 
5 ? ob <2 
o f = : 5,ifk=1 
k=1 á b! = 0, ie kV e— k — 1, otherwise 
bk + ik, otherwise 


Fig. 1. An Assume-Guarantee contract. 


We represent the system requirements using an Assume-Guarantee Contract. 
The assumptions of the contract restrict the possible inputs that the environment 
can provide to the system, while the guarantees describe safe reactions of the 
system to the outside world. 

A (conceptually) simple example is shown in Fig. 1. The contract describes a 
possible set of requirements for a specific instance of the Cinderella-Stepmother 
game. Our goal is to synthesize an implementation that describes Cinderella’s 
winning region of the game. Cinderella in this case is the implementation, as 
shown by the middle box in Fig. 1. Cinderella’s inputs are five different values 
ik, L < k < 5, determined by a random distribution of one unit of water by the 
Stepmother. During each of her turns Cinderella has to make a choice denoted by 
the output variable e, such that the buckets b do not overflow during the next 
action of her Stepmother. We define the contract using the set of assumptions 
A (left box in Fig. 1) and the guarantee constraints G (right box in Fig. 1). For 
the particular example, it is possible to construct at least one implementation 
that satisfies G given A which is described in Sect. 4.3. The proof of existence 
of such an implementation is the main concept behind the realizability problem, 
while the automated construction of a witness implementation is the main focus 
of program synthesis. 

Given a proof of realizability of the contract in Fig. 1, we are seeking for an 
efficient synthesis procedure that could provide an implementation. On the other 
hand, consider a variation of the example, where A = true. This is a practical 
case of an unrealizable contract, as there is no feasible Cinderella implementation 
that can correctly react to Stepmother’s actions. An example counterexample 
allows the Stepmother to pour random amounts of water into the buckets, leading 
to overflow of at least one bucket during each of her turns. 


3 Background 


We use two disjoint sets, state and inputs, to describe a system. A straightfor- 
ward and intuitive way to represent an implementation is by defining a transition 
system, composed of an initial state predicate I(s) of type state — bool, as well 
as a transition relation T(s,i,s’) of type state — inputs — state — bool. 
Combining the above, we represent an Assume-Guarantee (AG) contract 
using a set of assumptions, A: state — inputs — bool, and a set of guarantees G. 
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The latter is further decomposed into two distinct subsets Gz : state — bool and 
Gr : state — inputs — state — bool. The Gy, defines the set of valid initial 
states, and Gr contains constraints that need to be satisfied in every transition 
between two states. Importantly, we do not make any distinction between the 
internal state variables and the output variables in the formalism. This allows 
us to use the state variables to (in some cases) simplify the specification of 
guarantees since a contract might not be always defined over all variables in the 
transition system. 

Consequently, we can formally define a realizable contract, as one for which 
any preceding state s can transition into a new state s’ that satisfies the gua- 
rantees, assuming valid inputs. For a system to be ever-reactive, these new states 
s’ should be further usable as preceding states in a future transition. States like 
s and s’ are called viable if and only if: 


Viable(s) = Vi.(A(s,7) > Js’. Gr(s,i, s") A Viable(s’)) (1) 


This equation is recursive and we interpret it coinductively, i.e., as a greatest 
fixpoint. A necessary condition, finally, is that the intersection of sets of viable 
states and initial states is non-empty. As such, to conclude that a contract is 
realizable, we require that 


ds.G7(s) A Viable(s) (2) 


The synthesis problem is therefore to determine an initial state s; and function 
f(s,i) such that G(s;) and Vs,i.Viable(s) = Viable( f (s, i)). 

The intuition behind our proposed algorithm in this paper relies on the dis- 
covery of a fixpoint F that only contains viable states. We can determine whether 
F is a fixpoint by proving the validity of the following formula: 


Ys, i. (F(s) A A(s,t) > Js’.Gr(s, i, s") A F(s’)) 


In the case where the greatest fixpoint F is non-empty, we check whether it 
satisfies G; for some initial state. If so, we proceed by extracting a witnessing 
initial state and witnessing skolem function f(s,7) to determine s’ that is, by 
construction, guaranteed to satisfy the specification. 

To achieve both the fixpoint generation and the witness extraction, we depend 
on AE-VAL, a solver for Va-formulas. 


3.1 Skolem Functions and Regions of Validity 


We rely on the already established algorithm to decide the validity of Va-formulas 
and extract Skolem functions, called AE-VAL [10]. It takes as input a formula 
Vz. Jy. (x,y) where P(x,y) is quantifier-free. To decide its validity, AE-VAL 
first normalizes P(x, y) to the form S(x) => T(x, y) and then attempts to extend 
all models of S(x) to models of T(a,y). If such an extension is possible, then 
the input formula is valid, and a relationship between x and y are gathered in a 
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S(x) 


Fig. 2. Region of validity computed for an example requiring AE-VAL to iterate two 
times. 


Skolem function. Otherwise the formula is invalid, and no Skolem function exists. 
We refer the reader to [19] for more details on the Skolem-function generation. 

Our approach presented in this paper relies on the fact that during each run, 
AE-VAL iteratively creates a set of formulas {P;(x)}, such that each P;(x) has 
a common model with S(x) and P;(x) > Jy.T(ax,y). After n iterations, AE- 
VAL establishes a formula R„(x) = V; Pi(x) which by construction implies 
dy.T(a,y). If additionally S(x) = Rn(x), the input formula is valid, and the 
algorithm terminates. Figure 2 shows a Venn diagram for an example of the oppo- 
site scenario: Ro(x) = Tı (x) V To(x), but the input formula is invalid. However, 
models of each S(x) A P;(a) can still be extended to a model of T(z, y). 

In general, if after n iterations S(x) A T(x, y) A7R,(x) is unsatisfiable, then 
AE-VAL terminates. Note that the formula Vz. S(x) A R,(x) > dy. T(a,y) is 
valid by construction at any iteration of the algorithm. We say that R,(z) is a 
region of validity, and in this work, we are interested in the mazimal regions of 
validity, i.e., the ones produced by disjoining all {P;(a)} produced by AE-VAL 
before termination and by conjoining it with S(x). Throughout the paper, we 
assume that all regions of validity are maximal. 


Lemma 1. Let R,,(x) be the region of validity returned by AE-VAL for formula 
Vs. S(x) > dy.T(a,y). Then Vx. S(x) > (R, (x) © Ay.T (2, y)). 


Proof. (=) By construction of R,(x). 

(<) Suppose towards contradiction that the formula does not hold. Then 
there exists zo such that S(xo) A (Ay.T (x0, y)) A 7Rn(xo) holds. But this is a 
direct contradiction for the termination condition for AE-VAL. Therefore the 
original formula does hold. 


4 WValidity-Guided Synthesis from Assume-Guarantee 
Contracts 


Algorithm 1, named JSYN-vG (for validity guided), shows the validity-guided 
technique that we use towards the automatic synthesis of implementations. 
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Algorithm 1. JSyN-va (A: assumptions, G: guarantees) 


1: F(s) — true; > Fixpoint of viable states 
2: while true do 
3: o — Ys, i. (F(s) A A(s, i) > Js’.Gr(s, i, 8’) A F(s’)); 


4: (valid, validRegion, Skolem) — AE-VAL(¢@); 

5: if valid then 

6: if 3s.Gr(s) ^A F(s) then 

T: return (REALIZABLE, Skolem, s, F}; 

8: else > Empty set of initial or viable states 
9: return UNREALIZABLE; 

10: else > Extract region of validity Q(s, i) 
11: Q(s,i) — validRegion; 

12: ' — Vs. (F(s) > Ii. A(s, i) A =Q(s, i); 

13: (_, violatingRegion, ) — AE-VAL(¢’); 

14: W (s) — violatingRegion; 

15: F(s) — F(s) A=W (s); > Refine set of viable states 


The specification is written using the Assume-Guarantee convention that we 
described in Sect.3 and is provided as an input. The algorithm relies on AE- 
VAL, for each call of which we write (x,y,z) — AE-VAL(...): x specifies if 
the given formula is valid or invalid, y identifies the region of validity (in both 
cases), and z — the Skolem function (only in case of the validity). 

The algorithm maintains a formula F(s) which is initially assigned true 
(line 1). It then attempts to strengthen F(s) until it only contains viable states 
(recall Eqs. 1 and 2), i.e., a greatest fixpoint is reached. We first encode Eq. 1 in 
a formula ¢ and then provide it as input to AE-VAL (line 4) which determines 
its validity (line 5). If the formula is valid, then a witness Skolem is non-empty. 
By construction, it contains valid assignments to the existentially quantified 
variables of ¢. In the context of viability, this witness is capable of providing 
viable states that can be used as a safe reaction, given an input that satisfies 
the assumptions. 

With the valid formula ¢@ in hand, it remains to check that the fixpoint 
intersects with the initial states, i.e., to find a model of formula in Eq.2 by a 
simple satisfiability check. If a model exists, it is directly combined with the 
extracted witness and used towards an implementation of the system, and the 
algorithm terminates (line 7). Otherwise, the contract is unrealizable since either 
there are no states that satisfy the initial state guarantees Gz, or the set of viable 
states F is empty. 

If ¢ is not true for every possible assignment of the universally quantified 
variables, AE-VAL provides a region of validity Q(s,2) (line 11). At this point, 
one might assume that Q(s,7) is sufficient to restrict F towards a solution. This 
is not the case since Q(s,7) creates a subregion involving both state and input 
variables. As such, it may contain constraints over the contract’s inputs above 
what are required by A, ultimately leading to implementations that only work 
correctly for a small part of the input domain. 
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Fortunately, we can again use AE-VAL’s capability of providing regions of 
validity towards removing inputs from Q. Essentially, we want to remove those 
states from Q if even one input causes them to violate the formula on line 3. We 
denote by W the violating region of Q. To construct W, AE-VAL determines 
the validity of formula ¢’ — Vs. (F(s) = 3t.A(s,i) A aQ(s,7)) (line 12) and 
computes a new region of validity. 

If ¢’ is invalid, it indicates that there are still non-violating states (i.e., out- 
side W) that may lead to a fixpoint. Thus, the algorithm removes the unsafe 
states from F(s) in line 15, and iterates until a greatest fixpoint for F(s) is 
reached. If ¢’ is valid, then every state in F(s) is unsafe, under a specific input 
that satisfies the contract assumptions (since =Q(s,i) holds in this case), and 
the specification is unrealizable (i-e., in the next iteration, the algorithm will 
reach line 9). 


4.1 Soundness 
Lemma 2. Viable => F is an invariant for Algorithm 1. 


Proof. It suffices to show this invariant holds each time F is assigned. On line 1, 
this is trivial. For line 15, we can assume that Viable = F holds prior to this 
line. Suppose towards contradiction that the assignment on line 15 violates the 
invariant. Then there exists sọ such that F'(s9), W(so), and Viable(so) all hold. 
Since W is the region of validity for ¢’ on line 12, we have W (so) A F(so) > 
Ji. A(s, i) A 7Q(s0,7) by Lemma 1. Given that W (sọ) and F(sọ) hold, let ig 
be such that A(so, io) and ~Q(so,%9) hold. Since Q is the region of validity for 
@ on line 3, we have F(so) A A(so, i0) A Is'.Gr(so, i0, 5’) A F(s') > Q(so, io) 
by Lemma 1. Since F'(so), A(so,io) and =Q(so, io) hold, we conclude that 
ds’.Gr(so, io, s’) A F(s’) = L. We know that Viable = F holds prior to 
line 15, thus 4s’.G'p(s0, io, s’) A Viable(s’) = L. But this is a contradiction since 
Viable(so) holds. Therefore the invariant holds on line 15. 


Theorem 1. The REALIZABLE and UNREALIZABLE results of Algorithm 1 are 
sound. 


Proof. If Algorithm 1 terminates, then the formula for ¢ on line 3 is valid. 
Rewritten, F satisfies the formula 


Vs. F(s) > (Vi. A(s,i) > Js’.Gr(s,i, 8’) A F(s’)). (3) 
Let the function f be defined over state predicates as 


f =AV.As. Vi. A(s, 7) > As'.Gr(s, 1, 8’) A V(s’). (4) 


State predicates are equivalent to subsets of the state space and form a lattice 
in the natural way. Moreover, f is monotone on this lattice. From Eq. 3 we have 
F = f(F). Thus F is a post-fixed point of f. In Eq.1, Viable is defined as 
the greatest fixed-point of f. Thus f = Viable by the Knaster-Tarski theorem. 
Combining this with Lemma 2, we have F = Viable. Therefore the check on 
line 7 is equivalent to the check in Eq. 2 for realizability. 
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const C = 2.0; 


-—- empty buckets e and e+1 each round 
node game(il,i2,i3,i4,i5: real; e: int) returns (guarantee: bool); 
var 

bl, b2, b3, bê; b5 : real; 


let 

assert il >= 0.0 and i2 >= 0.0 and i3 >= 0.0 and i4 >= 0.0 and i5 >= 0.0; 
assert il + i2 + i3 + i4 + i5 =1.0; 

bl = 0.0 -> (if (e = 5 or e = 1) then il else (pre(b1l) + il)); 

b2 = 0.0 -> (if (e = 1 or e = 2) then i2 else (pre(b2) + i2)); 

b3 = 0.0 -> (if (e = 2 or e = 3) then i3 else (pre(b3) + i3)); 

b4 = 0.0 -> (if (e = 3 or e = 4) then i4 else (pre(b4) + i4)); 

b5 = 0.0 -> (if (e = 4 or e= 5) then i5 else (pre(b5) + i5)); 


guarantee = bl <= C and b2 <= C and b3 <= C and b4 <= C and b5 <= C; 


--%REALIZABLE il, i2, i3, i4, i5; 
--%PROPERTY guarantee; 
tel; 


Fig. 3. An Assume-Guarantee contract for the Cinderella-Stepmother game in Lustre. 


4.2 Termination on Finite Models 


Lemma 3. Every loop iteration in Algorithm 1 either terminates or removes at 
least one state from F. 


Proof. It suffices to show that at least one state is removed from F on line 15. 
That is, we want to show that F N W F Ø since this intersection is what is 
removed from F by line 15. 

If the query on line 4 is valid, then the algorithm terminates. If not, then 
there exists a state s* and input i* such that F(s*) and A(s*,i*) such that 
there is no state s’ where both G(s*,i*,s') and F(s’) hold. Thus, ~Q(s*,7i*), 
and s* € violatingRegion, so W + Ø. Next, suppose towards contradiction that 
FAW = Ø and W # Ø. Since W is the region of validity for ¢’ on line 12, 
we know that F lies completely outside the region of validity and therefore 
Vs. =Ji. A(s, i) A =Q(s,i) by Lemma 1. Rewritten, Vs, i. A(s, i) > Q(s, i). Note 
that Q is the region of validity for ¢ on line 3. Thus A is completely contained 
within the region of validity and formula ¢ is valid. This is a contradiction since if 
@ is valid then line 15 will not be executed in this iteration of the loop. Therefore 
FAW #@ and at least one state is removed from F on line 15. 


Theorem 2. For finite models, Algorithm 1 terminates. 


Proof. Immediately from Lemma 3 and the fact that AE-VAL terminates on 
finite models [10]. 


4.3 Applying JSYN-ve to the Cinderella-Stepmother Game 


Figure 3 shows one possible interpretation of the contract designed for the 
instance of the Cinderella-Stepmother game that we introduced in Sect. 2. The 
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contract is expressed in Lustre [18], a language that has been extensively used 
for specification as well as implementation of safety-critical systems, and is the 
kernel language in SCADE, a popular tool in model-based development. The 
contract is defined as a Lustre node game, with a global constant C denoting the 
bucket capacity. The node describes the game itself, through the problem’s input 
and output variables. The main input is Stepmother’s distribution of one unit 
of water over five different input variables, i1 to i5. While the node contains a 
sixth input argument, namely e, this is in fact used as the output of the system 
that we want to implement, representing Cinderella’s choice at each of her turns. 

We specify the system’s inputs i1, ..., i5 using the REALIZABLE statement 
and define the contract’s assumptions over them: A(i1,...,is) = (AŽ ik >= 
0.0) A D ik = 1.0). The assignment to boolean variable guarantee (distin- 
guished via the PROPERTY statement) imposes the guarantee constraints on the 
buckets’ states through the entire duration of the game, using the local vari- 
ables b1 to b5. Initially, each bucket is empty, and with each transition to a new 
state, the contents depend on whether Cinderella chose the specific bucket, or 
an adjacent one. If so, the value of each bz at the next turn becomes equal to 
the value of the corresponding input variable i,. Formally, for the initial state, 
G1(C, bi, ..-, 05) = (Ap bk = 0.0) \(Ap_, bx < C), while the transitional guar- 
antee is Gr([C, b1,...,05,e],i1,.--,%5,[C’, bf,..., bb, e7]) = (Neat ; = ite(e = 
kV € = Kprevy ik; bk + ix) A (Nab, < C’), where kprey = 5 if k = 1, and 
Kprev = k — 1 otherwise. Interestingly, the lack of explicit constraints over e, 
i.e. Cinderella’s choice, permits the action of Cinderella skipping her current 
turn, i.e. she does not choose to empty any of the buckets. With the addition 
of the guarantee (e = 1) V... V (e = 5), the contract is still realizable, and 
the implementation is verifiable, but Cinderella is not allowed to skip her turn 
anymore. 

If the bucket was not covered by Cinderella’s choice, then its contents are 
updated by adding Stepmother’s distribution to the volume of water that the 
bucket already had. The arrow (->) operator distinguishes the initial state (on 
the left) from subsequent states (on the right), and variable values in the previ- 
ous state can be accessed using the pre operator. The contract should only be 
realizable if, assuming valid inputs given by the Stepmother (i.e. positive values 
to input variables that add up to one water unit), Cinderella can keep reacting 
indefinitely, by providing outputs that satisfy the guarantees (i.e. she empties 
buckets in order to prevent overflow in Stepmother’s next turn). We provide the 
contract in Fig.3 as input to Algorithm 1 which then iteratively attempts to 
construct a fixpoint of viable states, closed under the transition relation. 

Initially F = true, and we query AE-VAL for the validity of formula 
Wisse 24 i5,b1,...,b5.A(t1,...,%5) > dq foe 05,€-Gr(t1,...,%5,01,...,05, ia 

, bs, e). Since F is empty, there are states satisfying A, for which there is no 
transition to Gr. In particular, one such counterexample identified by AE-VAL 
is represented by the set of assignments cer = {...,b4 = 3025,i4 = 0.2,b, = 
3025.2,...}, where the already overflown bucket b4 receives additional water du- 
ring the transition to the next state, violating the contract guarantees. In addition, 
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AE-VAL provides us with a region of validity Q(i1,...,%5,61,...,55), a for- 
mula for which Vi1,..., 15,01,... „bs. A(t1,.--,%5) A Q(i1,.. .,15,01,.. ., bs) > 
db),...,05,e.Gr(t,...,i5,b1,..., 05,04, ..., bg, €) is valid. Precise encoding of Q 
is too large to be presented in the paper; intuitively it contains some constraints on 
i1,..., i5 and by,...,6, which are stronger than A and which block the inclusion 
of violating states such as the one described by cez. 

Since Q is defined over both state and input variables, it might contain con- 
straints over the inputs, which is an undesirable side-effect. In the next step, 
AE-VAL decides the validity of formula Vbi,...,65 .di1,...,%5.A(t1,...,%5) A 
7Q(t1,...,%5,61,...,65) and extracts a violating region W over b1,...,b5. Pre- 
cise encoding of W is also too large to be presented in the paper; and intuitively 
it captures certain steps in which Cinderella may not take the optimal action. 
Blocking them leads us eventually to proving the contract’s realizability. 

From this point on, the algorithm continues following the steps explained 
above. In particular, it terminates after one more refinement, at depth 2. At 
that point, the refined version of ¢ is valid, and AE-VAL constructs a witness 
containing valid reactions to environment behavior. In general, the witness is 
described through the use of nested if-then-else blocks, where the conditions 
are subsets of the antecedent of the implication in formula ¢, while the body 
contains valid assignments to state variables to the corresponding subset. 


5 Implementation and Evaluation 


The implementation of the algorithm has been added to a branch of the 
JKIND [13] model checker!. JKIND officially supports synthesis using a k- 
inductive approach, named JSYN [19]. For clarity, we named our validity-guided 
technique JSYN-VG (i.e., validity-guided synthesis). JKIND uses Lustre [18] as 
its specification and implementation language. JSYN-vG encodes Lustre speci- 
fications in the language of linear real and integer arithmetic (LIRA) and com- 
municates them to AE-VAL?. Skolem functions returned by AE-VAL get then 
translated into an efficient and practical implementation. To compare the qual- 
ity of implementations against JSYN, we use SMTLIB2C, a tool that has been 
specifically developed to translate Skolem functions to C implementations’. 


5.1 Experimental Results 


We evaluated JSYN-VG by synthesizing implementations for 124 contracts * origi- 
nated from a broad variety of contexts. Since we have been unable to find past 
work that contained benchmarks directly relevant to our approach, we propose a 
comprehensive collection of contracts that can be used by the research community 
for future advancements in reactive system synthesis for contracts that rely on 
infinite theories. Our benchmarks are split into three categories: 


' The JKIND fork with JSYN-vG is available at https://goo.gl/WxupTe. 
? The AE-VAL tool is available at https://goo.g|/CbNMVN. 

3 The SMTL1B2C tool is available at https://goo.gl/EvNrAU. 

4 All of the benchmark contracts can be found at https://goo.gl/2p4sT9. 
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e 59 contracts correspond to various industrial projects, such as a Quad- 
Redundant Flight Control System, a Generic Patient Controlled Analgesia 
infusion pump, as well as a collection of contracts for a Microwave model, 
written by graduate students as part of a software engineering class; 

e 54 contracts were initially used for the verification of existing handwritten 
implementations [16]; 

e 11 models contain variations of the Cinderella-Stepmother game, as well as 
examples that we created. 


All of the synthesized implementations were verified against the original con- 
tracts using JKIND. 

The goal of this experiment was to determine the performance and generality 
of the JSYN-VG algorithm. We compared against the existing JSYN algorithm, 
and for the Cinderella model, we compared against [2] (this was the only syn- 
thesis problem in the paper). We examined the following aspects: 


e time required to synthesize an implementation; 

e size of generated implementations in lines of code (LoC); 

e execution speed of generated C implementations derived from the synthesis 
procedure; and 

e number of contracts that could be synthesized by each approach. 


Since JKIND already supports synthesis through JSYN, we were able to directly 
compare JSYN-VG against JSYN’s k-inductive approach. We ran the experiments 
using a computer with Intel Core i3-4010U 1.70 GHz CPU and 16GB RAM. 

A listing of the statistics that we tracked while running experiments is pre- 
sented in Table 1. Fig.4a shows the time allocated by JSYN and JSYN-vG to 
solve each problem, with JSYN-vG outperforming JSYN for the vast major- 
ity of the benchmark suite, often times by a margin greater than 50%. Fig. 4b 
on the other hand, depicts small differences in the overall size between the 
synthesized implementations. While it would be reasonable to conclude that 
there are no noticeable improvements, the big picture is different: solutions by 
JSYN-VG always require just a single Skolem function, but solutions by JSYN 
may require several (k — 1 to initialize the system, and one for the inductive 
step). In our evaluation, JSYN proved the realizability of the majority of bench- 
marks by constructing proofs of length k = 0, which essentially means that the 
entire space of states is an inductive invariant. However, several spikes in Fig. 4b 
refer to benchmarks, for which JSYN constructed a proof of length k > 0, which 
was significantly longer that the corresponding proof by JSYN-vG. Interestingly, 
we also noticed cases where JSYN implementations are (insignificantly) shorter. 
This provides us with another observation regarding the formulation of the prob- 
lem for k = 0 proofs. In these cases, JSYN proves the existence of viable states, 
starting from a set of pre-initial states, where the contract does not need to 
hold. This has direct implications to the way that the Va-formulas are cons- 
tructed in JSYN’s underlying machinery, where the assumptions are “baked” 
into the transition relation, affecting thus the performance of AE-VAL. 
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Table 1. Benchmark statistics. 


JSYN | JSYN-vG 

Problems solved 113 124 
Performance (avg - seconds) 5.72 | 2.78 
Performance (max - seconds) 352.1 | 167.55 
Implementation Size (avg - Lines of Code) | 72.88 | 70.66 
Implementation Size (max - Lines of Code) 2322 | 2142 
Implementation Performance (avg - ms) 57.84 | 56.32 
Implementation Performance (max - ms) | 485.88 | 459.95 


Table 2. Cinderella-Stepmother results. 


Game JSYN-VG CoNSYNTH [2] 
Impl. Size | Impl. Performance | Time | Time | Time 
(LoC) (ms) (Z3) (Barcelogic) 
Cind (C = 3) | 204 128.09 4.58 |3.2s 1.28 
Cind2 (C = 3) | 2081 160.87 28.78 
Cind (C = 2) | 202 133.04 4.7s |1m52s/1m52s 
Cind2 (C = 2) | 1873 182.19 27.28 
aa JSYN aa JSYN 
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Fig. 4. Experimental results. 
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One last statistic that we tracked was the performance of the synthesized C 
implementations in terms of execution time, which can be seen in Fig. 4c. The 
performance was computed as the mean of 1000000 iterations of executing each 
implementation using random input values. According to the figure as well as 
Table 1, the differences are minuscule on average. 

Figure 4 does not cover the entirety of the benchmark suite. From the original 
124 problems, eleven of them cannot be solved by JSYN’s k-inductive approach. 
Four of these files are variations of the Cinderella-Stepmother game using dif- 
ferent representations of the game, as well as two different values for the bucket 
capacity (2 and 3). Using the variation in Fig. 3 as an input to JSYN, we receive 
an “unrealizable” answer, with the counterexample shown in Fig.5. Reading 
through the feedback provided by JSYN, it is apparent that the underlying SMT 
solver is incapable of choosing the correct buckets to empty, leading eventually 
to a state where an overflow occurs for the third bucket. As we already discussed 
though, a winning strategy exists for the Cinderella game, as long as the bucket 
capacity C is between 1.5 and 3. This provides an excellent demonstration of the 
inherent weakness of JSYN for determining unrealizability. JSYN-vG’s validity- 
guided approach, is able to prove the realizability for these contracts, as well as 
synthesize an implementation for each. 

Table 2 shows how JSYN-vG performed on the four contracts describing the 
Cinderella-Stepmother game. We used two different interpretations for the game, 
and exercised both for the cases where the bucket capacity C is equal to 2 
and 3. Regarding the synthesized implementations, their size is analogous to 
the complexity of the program (Cinderella2 contains more local variables and 
a helper function to empty buckets). Despite this, the implementation perfor- 
mance remains the same across all implementations. Finally for reference, the 
table contains the results from the template-based approach followed in CoN- 
SYNTH [2]. From the results, it is apparent that providing templates yields better 
performance for the case of C = 3, but our approach overperforms CONSYNTH 
when it comes to solving the harder case of C = 2. Finally, the original paper for 
CONSYNTH also explores the synthesis of winning strategies for Stepmother using 
the liveness property that a bucket will eventually overflow. While JKIND does 
not natively support liveness properties, we successfully synthesized an imple- 
mentation for Stepmother using a bounded notion of liveness with counters. We 
leave an evaluation of this category of specifications for future work. 

Overall, JSYN-v@’s validity-guided approach provides significant advantages 
over the k-inductive technique followed in JSYN, and effectively expands JKIND’s 
solving capabilities regarding specification realizability. On top of that, it pro- 
vides an efficient “hands-off” approach that is capable of solving complex games. 
The most significant contribution, however, is the applicability of this approach, 
as it is not tied to a specific environment since it can be extended to support 
more theories, as well as categories of specification. 
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UNREALIZABLE || K = 6 || Time = 2.017s 

Step 
variable 0 1 2 3 4 5 
INPUTS 
il 0 0 O 0.416» 0.944% 0.666 
i2 1 0 0.083% 0.083 0 0.055 
i3 0 1 0.305» 0.5 0.027% 0.194 
i4 0 0 0.611* 0 0 0.027» 
id 0 0 0 0 0.027* 0:055% 
OUTPUTS 
e 1 3 1 5 4 5 
NODE OUTPUTS 
guarantee true true true true true false 
NODE LOCALS 
b1 0 0 O 0.416% 1.361* 0.666% 
b2 0 O 0.083% 0.166* 0.166» 0.222% 
b3 0 1 1.305* 1.805% 1.833% 2.027 
b4 0 0 0.611* 0.611 0 0.027 
b5 0 0 0 0 0.027* 0.055 


x display value has been truncated 
FHEFEFFFHEAFEFFEFAFEFFEFEFEFEFEEFETEFEFE FEAF E LTTE EEF EPP EE 


Fig. 5. Spurious counterexample for Cinderella-Stepmother example using JSYN 


6 Related Work 


The work presented in this paper is closely related to approaches that attempt to 
construct infinite-state implementations. Some focus on the continuous interac- 
tion of the user with the underlying machinery, either through the use of temp- 
lates [2,28], or environments where the user attempts to guide the solver by 
choosing reactions from a collection of different interpretations [26]. In contrast, 
our approach is completely automatic and does not require human ingenuity to 
find a solution. Most importantly, the user does not need to be deeply familiar 
with the problem at hand. 

Iterative strengthening of candidate formulas is also used in abductive infer- 
ence [8] of loop invariants. Their approach generates candidate invariants as maxi- 
mum universal subsets (MUS) of quantifier-free formulas of the form ¢ => y. 
While a MUS may be sufficient to prove validity, it may also mislead the invari- 
ant search, so the authors use a backtracking procedure that discovers new sub- 
sets while avoiding spurious results. By comparison, in our approach the regions 
of validity are maximal and therefore backtracking is not required. More impor- 
tantly, reactive synthesis requires mixed-quantifier formulas, and it requires that 
inputs are unconstrained (other than by the contract assumptions), so substan- 
tial modifications to the MUS algorithm would be necessary to apply the approach 
of [8] for reactive synthesis. 

The concept of synthesizing implementations by discovering fixpoints was 
mostly inspired by the IC3/PDR [4,9], which was first introduced in the context 
of verification. Work from Cimatti et al. effectively applied this idea for the 
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parameter synthesis in the HyComMP model checker [5,6]. Discovering fixpoints 
to synthesize reactive designs was first extensively covered by Piterman et al. [23] 
who proved that the problem can be solved in cubic time for the class of GR(1) 
specifications. The algorithm requires the discovery of least fixpoints for the 
state variables, each one covering a greatest fixpoint of the input variables. If 
the specification is realizable, the entirety of the input space is covered by the 
greatest fixpoints. In contrast, our approach computes a single greatest fixpoint 
over the system’s outputs and avoids the partitioning of the input space. As the 
tools use different notations and support different logical fragments, practical 
comparisons are not straightforward, and thus are left for the future. 

More recently, Preiner et al. presented work on model synthesis [24], that 
employs a counterexample-guided refinement process [25] to construct and check 
candidate models. Internally, it relies on enumerative learning, a syntax-based 
technique that enumerates expressions, checks their validity against ground test 
cases, and proceeds to generalize the expressions by constructing larger ones. 
In contrast, our approach is syntax-insensitive in terms of generating regions of 
validity. In general, enumeration techniques such as the one used in CONSYNTH’s 
underlying E-HSF engine [2] is not an optimal strategy for our class of problems, 
since the witnesses constructed for the most complex contracts are described by 
nested if-then-else expressions of depth (i.e. number of branches) 10-20, a point 
at which space explosion is difficult to handle since the number of candidate 
solutions is large. 


7 Conclusion and Future Work 


We presented a novel and elegant approach towards the synthesis of reactive sys- 
tems, using only the knowledge provided by the system specification expressed in 
infinite theories. The main goal is to converge to a fixpoint by iteratively blocking 
subsets of unsafe states from the problem space. This is achieved through the 
continuous extraction of regions of validity which hint towards subsets of states 
that lead to a candidate implementation. 

This is the first complete attempt, to the best of our knowledge, on handling 
valid subsets of a Vi-formula to construct a greatest fixpoint on specifications 
expressed using infinite theories. We were able to prove its effectiveness in prac- 
tice, by comparing it to an already existing approach that focuses on constructing 
k-inductive proofs of realizability. We showed how the new algorithm performs 
better than the k-inductive approach, both in terms of performance as well as the 
soundness of results. In the future, we would like to extend the applicability of 
this algorithm to other areas in formal verification, such as invariant generation. 
Another interesting goal is to make the proposed benchmark collection available 
to competitions such as SYNTCOMP, by establishing a formal extension for 
the TLSF format to support infinite-state problems [17]. Finally, a particularly 
interesting challenge is that of mapping infinite theories to finite counterparts, 
enabling the synthesis of secure and safe implementations. 
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Data Availability Statement. The datasets generated during and/or analyzed 
during the current study are available in the figshare repository: https://doi.org/10. 
6084/m9.figshare.5904904.v1 [20]. 
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Abstract. We present RVHyper, a runtime verification tool for hyper- 
properties. Hyperproperties, such as non-interference and observational 
determinism, relate multiple computation traces with each other. Spec- 
ifications are given as formulas in the temporal logic HyperLTL, which 
extends linear-time temporal logic (LTL) with trace quantifiers and trace 
variables. RVHyper processes execution traces sequentially until a vio- 
lation of the specification is detected. In this case, a counter example, 
in the form of a set of traces, is returned. As an example application, 
we show how RVHyper can be used to detect spurious dependencies in 
hardware designs. 


1 Introduction 


Hyperproperties [4] generalize trace properties in that they not only check the 
correctness of individual computation traces in isolation, but relate multiple 
computation traces to each other. HyperLTL [3] is a logic for expressing tempo- 
ral hyperproperties, by extending linear-time temporal logic with explicit trace 
quantification. HyperLTL has been used to specify a variety of information- 
flow and security properties. Examples include classical properties like non- 
interference and observational determinism, as well as quantitative information- 
flow properties, symmetries in hardware designs, and formally verified error 
correcting codes [8]. While model checking and satisfiability checking tools for 
HyperLTL already exist [5,8], the runtime verification of HyperLTL specifica- 
tions has so far, despite recent theoretical progress [1,2,7], not been supported 
by practical tool implementations. 

Monitoring hyperproperties is difficult: in principle, the monitor not only 
needs to process every observed trace, but must also store every trace observed 
so far, so that future traces can be compared with the traces seen so far. On the 
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other hand, a runtime verification tool for hyperproperties is certainly useful, 
in particular if the implementation of a security critical system is not available. 
Even without access to the source code, monitoring the observable execution 
traces still detects insecure information flow. 

In this paper, we present RVHyper, a runtime verification tool for monitoring 
temporal hyperproperties. RVHyper tackles this challenging problem by imple- 
menting two major optimizations: (1) a trace analysis, which detects all redundant 
traces that can be omitted during the monitoring process and (2) a specification 
analysis to detect exploitable properties of a hyperproperty, such as symmetry. 

We have applied RVHyper in classical information-flow security, such as check- 
ing for violations of observational determinism. HyperLTL is, however, not limited 
to security policies. As an example of such an application beyond security, we show 
how RVHyper can be used to detect spurious dependencies in hardware designs. 


2 RVHyper 


In this section we give an overview on the monitoring approach, including the 
input and output of the monitoring algorithm and the two major optimization 
techniques implemented in RVHyper. 


Specification. The input to RVHyper is a HyperLTL specification. 
HyperLTL [3] is a temporal logic for specifying hyperproperties. The logic 
extends LTL with quantification over trace variables 7 and a method to link 
atomic propositions to specific traces. The set of trace variables is V. Formulas 
in HyperLTL are given by the grammar 


pu=Vr.p|iay|w, and 
p= ar| | YV y Oy| yuy, 


where a € AP and z € V. The finite trace semantics [2] for HyperLTL is based 
on the finite trace semantics of LTL. In the following, when using £(y) we refer 
to the finite trace semantics of a HyperLTL formula vy. Let t be a finite trace, € 
denotes the empty trace, and |t| denotes the length of a trace. Since we are in a 


finite trace setting, t[i,...] denotes the subsequence from position 7 to position 
|t} 1. Let fin : V > &* be a partial function mapping trace variables to finite 
traces. We define [0] as the empty set. Manli, ...] denotes the trace assignment 
that is equal to fn(7)[i,...] for all 7. We define a subsequence of t as follows. 
of € if i > |t| 
tlij) = g l 
tli, min(j, |t| — 1)], otherwise 


Lin ET Ar if a € fin (7) [0] 

TT fin FT YP if Lyin Fr p 

IT fin Fr pV w if Han Fr y or IT fin Er w 

fin Fr Ov if fin [1,...] Fr yp 

fn -repuyw if i> 0. fin |i, . - .] Fr wAVO<j< i. Hin J, -- -] Er yp 
Ign Fr In. if there is some t € T such that Hpnz[a > t] Er p 
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input : Y” HyperLTL formula y, 
set of traces T, 
fresh trace t 

output: satisfied or n-ary tuple 
witnessing violation 


Mg, = build_template(y); 


for each tuple N € (T U {t})” do 
if M, accepts N then 


input : HyperLTL formula y, 
redundancy free trace set 
T, fresh trace t 

output: redundancy free set of 
traces Tmin C T U {t} 


Mo = build_template(y) 


foreach ť € T do 
if t dominates t then 


| proceed; | return T 
else end 
| return N; end 
end foreach t € T do 
end if t dominates t' then 
return satisfied; | T=T\{} 
end 


Algorithm 1. A high-level sketch 
of the monitoring algorithm for V” 
HyperLTL formulas. 


end 
return T U {t} 


Algorithm 2. Trace analysis algo- 
rithm to minimize trace storage. 


For example, above mentioned observational determinism can be formalized as 
the HyperLTL formula Yr. Yr’. (Or = Ox) W (Inr # In’), where W is the weak 
version of U. 


Input and Output. The input of RVHyper consists of a HyperLTL formula and 
the observed behavior of the system under consideration. The observed behavior 
is represented as a trace set T, where each t € T represents a previously observed 
execution of the system under consideration. If RVHyper detects that the system 
violates the hyperproperty, it outputs a counter example, i.e, a k-ary tuple of 
traces, where k is the number of quantifiers in the HyperLTL formula. 


Monitoring Algorithm. Given a HyperLTL formula y and a trace set T, 
RVHyper processes a fresh trace under consideration as depicted in Algorithm 1. 
The algorithm revolves around a monitor-template Mọ, which is constructed 
from the HyperLTL formula y. The basic idea of the monitor template is that 
it still contains every trace variables of y, which can be initialized with explicit 
traces at runtime. This way, the automaton construction of the monitor template 
is constructed only once as a preprocessing step. 

RVHyper initializes the monitor template for each k-ary combination of 
traces in TU {t}. If one tuple violates the hyperproperty, RVHyper returns that 
k-ary tuple of traces as a counter example, otherwise RVHyper returns satisfied. 


Trace Analysis: Minimizing Trace Storage. The main obstacle in monitor- 
ing hyperproperties is the potentially unbounded space consumption. RVHyper 
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uses a trace analysis to detect redundant traces, with respect to a given Hyper- 
LTL formula, i.e., traces that can be safely discarded without losing any infor- 
mation and without losing the ability to return a counter example. 

RVHyper’s trace analysis is based on the definition of trace redundancy: 
we say a fresh trace t is (T,y)-redundant, if T is a model of » if and only 
if TU {t} is a model of y. The idea, depicted in Algorithm 2, is to check if 
another trace ¢’ contains at least as much informations as t: we say a t’ dominates 
t if ApeyL(M,[t’/7]) E L£(M,|t/n]). For a fresh incoming trace, RVHyper 
performs this language inclusion check in both directions in order to compute 
the minimal trace set that must be stored to monitor the hyperproperty under 
consideration. 


Specification Analysis: Decreasing Running Time. RVHyper uses a spec- 
ification analysis, which is a preprocessing step that analyzes the HyperLTL 
formula under consideration. RVHyper detects whether a formula is (1) sym- 
metric, i.e., we halve the number of instantiated monitors, (2) transitive, i.e, we 
reduce the number of instantiated monitors to two, or (3) reflexive, i.e., we can 
omit the self comparison of traces [7]. 

Symmetry is especially interesting because many information flow poli- 
cies satisfy this property. Consider, for example, observational determinism: 
Yr. Yr’. (Or = Or) W (In # Iw). RVHyper detects symmetry by translating 
this formula to a formula that is unsatisfiable if there exists no pair of traces 
which violates the symmetry condition: Ir. Ir’. ((Or = Or) W (In # Ir)) + 
((Or = Or)W (Ix # I7)). If the resulting formula turns out to be unsatisfiable, 
RV Hyper omits the symmetric instantiations of the monitor automaton, which 
turns out to be, especially in combination with RVHypers trace analysis, a major 
optimization in practice [7]. 


Implementation. RVHyper! is written in C++. It uses spot for building the 


deterministic monitor automata and the Buddy BDD library for handling symbolic 
constraints. We use the HyperLTL satisfiability solver EAHyper [5,6] to deter- 
mine whether the input formula is reflexive, symmetric, or transitive. Depending 
on those results, we omit redundant tuples in the monitoring algorithm. 


3 Detecting Spurious Dependencies in Hardware Designs 


While HyperLTL has been applied to a range of domains, including security and 
information flow properties, we focus in the following on a classical verification 
problem, the independence of signals in hardware designs. We demonstrate how 
RVHyper can automatically detect such dependencies from traces generated from 
hardware designs. 


1 The implementation is available at https: / /react.uni-saarland.de/tools/rvhyper/. 
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Input and Output. The input to RVHyper is a set of sel 
traces where the propositions match the atomic proposi- 
tions of the HyperLTL formula. For the following experi- 
ments, we generate a set of traces from the Verilog descrip- 
tion of several example circuits by random simulation. If 
a set of traces violates the specification, RVHyper returns 
a counter example. 


Specification. We consider the problem of detecting 
whether input signals influence output signals in hardware 
designs. We write į & o to denote that the inputs į do not 
influence the outputs o. Formally, we specify this property 
as the following HyperLTL formula: 


Vri VT. (On, = On) W (ir # irz), 


Fig. 1. MUX circuit 
with black box 


where 7 denotes all inputs except i. Intuitively, the formula asserts that for every 
two pairs of execution traces (71,72) the value of o has to be the same until 
there is a difference between 7, and 7 in the input vector 4, i.e., the inputs on 
which o may depend. 


Sample Hardware Designs. We apply RVHyper to traces generated from 
the following hardware designs. Note that, since RVHyper observes traces and 
treats the system that generates the traces as a black box, the performance of 
RVHyper does not depend on the size of the circuit. 


Example 1 (XOR). As a first example, consider the XOR function o = i@ 7. 
In the corresponding circuit, every j-th output bit 0; is only influenced by the 
j-the input bits ij and ¢j. 


Example 2 (MUX). This example circuit is depicted in Fig. 1. There is a black 
box combinatorial circuit, guarded by a multiplexer that selects between the two 
input vectors 7 and 2’ and an inverse multiplexer that forwards the output of the 
black box either towards o or o’. Despite there being a syntactic dependency 
between o and 7’, there is no semantic dependency, i.e., the output o does solely 
depend on 2 and the selector signal. 

When using the same example, but with a sequential circuit as black box, 
there may be information flow from the input vector i’ to the output vector o 
because the state of the latches may depend on it. We construct such a circuit 
that leaks information about 7’ via its internal state. 


Example 3 (counter). Our last example is a binary counter with two input 
control bits incr and decr that increments and decrements the counter. The 
corresponding Verilog design is shown in Fig.2. The counter has a single out- 
put, namely a signal that is set to one when the counter value overflows. Both 
inputs influence the output, but timing of the overflow depends on the number 
of counter bits. 
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1 module counter (increase, 15 begin 

2 decrease, overflow); 16 counter = 0; 

3 input increase; ı7 end 

4 input decrease; is always @($global_clock) 

5 output overflow; 19 begin 

6 20 if (increase && ! decrease) 
7 veg[2:0] counter; 21 counter = counter + 1; 

8 22 else if (!increase && decrease 
9 assign overflow = (counter 23 && counter > 0) 
10 == 3’biil && increase 24 counter = counter - 1; 
11 && !decrease); 25 else 

12 26 counter = counter; 

13 27 end 

14 initial 23 endmodule 


Fig. 2. Verilog description of Example 3 (counter). 


Table 1. Results of RV Hyper on traces generated from circuit instances. Every instance 
was run 10 times with different seeds and the average is reported. 


Instance | Property Satisfied | # traces | Length | Time # instances 
XOR io *& 00 no 18 5 12 ms 222 
XOR ii 00 yes 1000 5 16913ms | 499500 
counter | incr y> overflow | no 1636 20 28677 ms | 1659446 
counter | decr & overflow | no 1142 20 15574ms | 887902 
MUX i’ #o yes 1000 5 14885 ms | 499500 
MUX2 i’ #o no 82 5 140 ms 3704 


Results. The results of multiple random simulations are given in Table1. 
Despite the high complexity of the monitoring problem, RVHyper is able to 
scale up to thousands of input traces with millions of monitor instantiations 
(cf. Algorithm 1). RVHyper’s optimizations, i.e., keeping only a minimal set of 
traces and reducing the number of instances by the specification analysis, are 
a key factor to those results. For the two instances where the property is sat- 
isfied (XOR and MUX), RVHyper has not found a violation for any of the runs. 
For instances where the property is violated, RVHyper is able to find counter 
examples. While counter examples can be found quickly for XOR and MUX2, the 
counter instances need more traces since the chance of finding a violating pair 
of traces is lower. 


4 Conclusion 


RVHyper monitors a running system for violations of a HyperLTL specifica- 
tion. The functionality of RVHyper thus complements model checking tools for 
HyperLTL, like MCHyper [8], and tools for satisfiability checking, like EAHy- 
per [6]. RVHyper is in particular useful during the development of a HyperLTL 
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specification, where it can be used to check the HyperLTL formula on sample 
traces without the need for a complete model. Based on the feedback of the tool, 
the user can refine the HyperLTL formula until it captures the intended policy. 
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Abstract. We present the Refinement Calculus of Reactive Systems 
Toolset, an environment for compositional modeling and reasoning about 
reactive systems, built on top of Isabelle, Simulink, and Python. 


1 Introduction 


The Refinement Calculus of Reactive Systems (RCRS) is a compositional frame- 
work for modeling and reasoning about reactive systems. RCRS has been inspired 
by component-based frameworks such as interface automata [3] and has its ori- 
gins in the theory of relational interfaces [14]. The theory of RCRS has been 
introduced in [13] and is thoroughly described in [11]. 

RCRS comes with a publicly available toolset, the RCRS toolset (Fig. 1), 
which consists of: 


— A full implementation of RCRS in the Isabelle proof assistant [9]. 

— A set of analysis procedures for RCRS components, implemented on top of 
Isabelle and collectively called the Analyzer. 

— A Translator of Simulink diagrams into RCRS code. 

— A library of basic RCRS components, including a set of basic Simulink blocks 
modeled in RCRS. 


An extended version of this paper contains an additional six-page appendix 
describing a demo of the RCRS toolset [6]. The extended paper can also be 
found in a figshare repository [7]. The figshare repository contains all data (code 
and models) required to reproduce all results of this paper as well as of [6]: see 
Section “Data Availability Statement” for more details. The RCRS toolset can 
be downloaded also from the RCRS web page: http://rcrs.cs.aalto.fi/. 
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RCRS theory and component library 


Options (Isabelle theory files) 
(translation strategy, etc.) 
> incompatiblity detection 
Analyzer —> internal variable elimination 
= Translator [4 (built on top of -——> auto generated top-level contract 
Isabelle |» refinement checking 


RCRS model 
of the diagram 


Simulink 


4 theorem prover) 
diagram 


Fig. 1. The RCRS toolset. 


2 Modeling Systems in RCRS 


RCRS provides a language of components to model systems in a modular fash- 
ion. Components can be either atomic or composite. Here are some examples of 
atomic RCRS components: 


definition "Id = [:x ~ y. y=x:]" 
definition "Add = [: (x, y ~ z.z=x+y:]" 

definition "Constant c = [: x::unit ~ y. y=c:]" 
definition "UnitDelay = [: (x,s) ~ (y,s’) = 

definition "SqrRoot = {. x .x>0.} o 
definition "NonDetSqrt = {. x.x > 0 .} 
definition "ReceptiveSqrt = [: x ~ y. x 
definition "A = {. x . O9x .} o [: x~ 


Id models the identity function: it takes input x and returns y such that 
y = x. Add returns the sum of its two inputs. Constant is parameterized by c, 
takes no input (equivalent to saying that its input variable is of type unit), and 
returns an output which is always equal to c. UnitDelay is a stateful component: 
s is the current-state variable and s’ is the next-state variable. SqrRoot is a 
non-input-receptive component: its input x is required to satisfy x>0. (SqrRoot 
may be considered non-atomic as it is defined as the serial composition of two 
predicate transformers — see Sect. 3.) NonDetSqrt is a non-deterministic version 
of SqrRoot: it returns an arbitrary (but non-negative) y, and not necessarily 
the square-root of x. ReceptiveSqrt is an input-receptive version of SqrRoot: 
it accepts negative inputs, but may return an arbitrary output for such inputs. 
RCRS also allows to describe components using the temporal logic QLTL, an 
extension of LTL with quantifiers [11]. An example is component A above. A 
accepts an infinite input sequence of x’s, provided x is infinitely often true, and 
returns a (non-deterministic) output sequence which satisfies the same property. 

Composite components are formed by composing other (atomic or composite) 
components using three primitive composition operators, as illustrated in Fig. 2: 
C o C’ (in series) connects outputs of C to inputs of C’; C ** C” (in parallel) 
“stacks” C and C” “on top of each other”; and feedback(C) connects the first 
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y 
wf >| O > > C! oan w C! piv 
(a) serial: C o C” (b) parallel: C ** C” (c) feedback: feedback(C) 


Fig. 2. The three composition operators of RCRS. 


output of C to its first input. These operators are sufficient to express any block 
diagram, as described in Sect. 4. 


3 The Implementation of RCRS in Isabelle 


RCRS is fully implemented in the Isabelle theorem prover. The RCRS imple- 
mentation currently consists of 22 Isabelle theories (.thy files), totalling 27588 
lines of Isabelle code. Some of the main theories are described next. 

Theory Refinement .thy (1209 lines) contains a standard implementation of 
refinement calculus [1]. Systems are modeled as monotonic predicate transform- 
ers [4] with a weakest precondition interpretation. Within this theory we imple- 
mented non-deterministic and deterministic update statements, assert state- 
ments, parallel composition, refinement and other operations, and proved nec- 
essary properties of these. 

Theory RefinementReactive.thy (1144 lines) extends Reactive.thy to 
reactive systems by introducing predicates over infinite traces in addition to 
predicates over values, and property transformers in addition to predicate trans- 
formers [11,13]. 

Theory Temporal. thy (788 lines) implements a semantic version of QLTL, 
where temporal operators are interpreted as predicate transformers. For example, 
the operator O, when applied to the predicate on infinite traces (x > 0) : (nat > 
real) — bool, returns another predicate on infinite traces O(a > 0) : (nat > 
real) — bool. Temporal operators have been implemented to be polymorphic 
in the sense that they apply to predicates over an arbitrary number of variables. 

Theory Simulink. thy (873 lines) defines a subset of the basic blocks in the 
Simulink library as RCRS components (at the time of writing, 48 Simulink block 
types can be handled). In addition to discrete-time, we can handle continuous- 
time blocks with a fixed-step forward Euler integration scheme. For example, 
Simulink’s integrator block can be defined in two equivalent ways as follows: 


definition "Integrator dt [- (x,s) ~ (s, stx*dt) -]" 
definition "Integrator dt = [: (x,s) ~ (y,s’). y=s A s’=s+x*dt :]" 


The syntax [- x~ f(x) -] assumes that f is a function, whereas [: :] can be 
used also for relations (i.e., non-deterministic systems). Using the former instead 


204 I. Dragomir et al. 


of the latter to describe deterministic systems aids the Analyzer to perform 
simplifications — see Sect. 5. 

Theory SimplifyRCRS.thy (2175 lines) implements several of the Analyzer’s 
procedures. In particular, it contains a simplification procedure which reduces 
composite RCRS components into atomic ones (see Sect. 5). 

In addition to the above, there are several theories containing a proof of 
correctness of our block-diagram translation strategies (see Sect.4 and [10]), 
dealing with Simulink types [12], generating Python simulation code, and many 
more. A detailed description of all these theories and graphs depicting their 
dependencies is included in the documentation of the toolset. 

The syntax of RCRS components is implemented in Isabelle using a shallow 
embedding [2]. This has the advantage of all datatypes and other mechanisms 
of Isabelle (e.g., renaming) being available for component specification, but also 
the disadvantage of not being able to express properties and simplifications of 
the RCRS language within Isabelle, as discussed in [11]. A deep embedding, in 
which the syntax of components is defined as a datatype of Isabelle, is possible, 
and is left as an open future work direction. 


4 The Translator 


The Translator, called simulink2isabelle, translates hierarchical block dia- 
grams (HBDs), and in particular Simulink models, into RCRS theories [5]. The 
Translator (implemented in about 7100 lines of Python code) takes as input 
a Simulink model (.s1x file) and a list of options and generates as output an 
Isabelle theory (.thy file). The output file contains: (1) the definition of all 
instances of basic blocks in the Simulink diagram (e.g., all Adders, Integrators, 
Constants, etc.) as atomic RCRS components; (2) the bottom-up definition of 
all subdiagrams as composite RCRS components; (3) calls to simplification pro- 
cedures; and (4) theorems stating that the resulting simplified components are 
equivalent to the original ones. The . thy file may also contain additional content 
depending on user options as explained below. 

As shown in [5], there are many possible ways to translate a block diagram 
into an algebra of components with the three primitive composition operators 
of RCRS. This means that step (2) above is not unique. simulink2isabelle 
implements the several translation strategies proposed in [5] as user options. 

For example, when run on the Simulink 
diagram of Fig.3, the Translator produces 
a file similar to the one shown in Fig. 4. i yt 
IC_Model and FP_Model are composite RCRS `; rer Z 
components generated automatically w.r.t. 
two different translation strategies, imple- 
mented by user options -ic and -fp. The 
simplify _RCRS construct is explained in 
Sect. 5 that follows. 

Other user options to the Translator include: whether to flatten the input dia- 
gram, optional typing information for wires, and whether to generate in addition 


Out 
UnitDelay 


Fig. 3. A Simulink diagram. 
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to the top-level STS component, a QLTL component representing the tempo- 
ral behavior of the system. The user can also ask the Translator to generate: 
(1) components w.r.t. all translation strategies; (2) the corresponding theorems 
showing that these components are all semantically equivalent; and (3) Python 
simulation scripts for the top-level component. 


theory Summation imports ... 
begin 
named_theorems basic_simps 
lemmas basic_simps = simulink_simps 
definition [basic_simps]: "Split = [- a ~ a, a -]" 
definition [basic_simps]: "Add = [- f, g~ f +g -]" 
definition [basic_simps]: "UnitDelay = [- d, s ~œ s, d -]" 
simplify_RCRS "IC_Model = feedback([- f, g, s~ (f, g), s -] o 
(Add ** Id) o UnitDelay o (Split *» Id) o 
[= (f, h), s~ f, h, s' -])" 
"(g s)" "(h s^)" 
simplify_RCRS "FP_Model = feedback (feedback (feedback ([- f, d, a, g, S 
~ (f, g), (d, s), a-] o (Add ** UnitDelay ** Split) o 
[- d, (a, s’), (f, h) ~œ f, d, a, h, s’ -])))" 
"(g s)" "(h s’)" 
end 


Fig. 4. Auto-generated Isabelle theory for the Simulink diagram of Fig. 3 


5 The Analyzer 


The Analyzer is a set of procedures implemented on top of Isabelle and 
ML, the programming language of Isabelle. These procedures implement a 
set of functionalities such as simplification, compatibility checking, refinement 
checking, etc. Here we describe the main functionalities, implemented by the 
simplify_-RCRS construct. As illustrated in Fig.4, the general usage of this 
construct is simplify_RCRS "Model = C" "in" "out", where C is a (generally 
composite) component and in, out are (tuples of) names for its input and output 
variables. When such a statement is executed in Isabelle, it performs the follow- 
ing steps: (1) It creates the definition Model = C. (2) It expands C, meaning 
that it replaces all atomic components and all composition operators in C with 
their definitions. This results in an Isabelle expression E. E is generally a com- 
plicated expression, containing formulas with quantifiers, case expressions for 
tuples, function compositions, and several other operators. (3) simplify_RCRS 
simplifies E, by eliminating quantifiers, renaming variables, and performing sev- 
eral other simplifications. The simplified expression, F, is of the form {.p.} o 
[:r:], where p is a predicate on input variables and r is a relation on input and 
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output variables. That is, F is an atomic RCRS component. (4) simplify RCRS 
generates a theorem stating that Model is semantically equivalent to F, and also 
the mechanized proof of this theorem (in Isabelle). Note that the execution by the 
Analyzer of the .thy file generated by the Translator is fully automatic, despite 
the fact that Isabelle generally requires human interaction. This is thanks to 
the fact that the theory generated by the Translator contains all declarations 
(equalities, rewriting rules, etc.) neccessary for the Analyzer to produce the sim- 
plifications and their mechanical proofs, without user interaction. 

For example, when the theory in Fig. 4 is executed, the following theorem is 
generated and proved automatically: 


Model = [- (g, s) ~ (s, stg) -] 


where Model is either IC_Model or FP_Model. The rightmost expression is the 
automatically generated simplification of the top-level system to an atomic 
RCRS component. 

If the model contains incompatibilities, where for instance the input condition 
of a block like SqrRoot cannot be guaranteed by the upstream diagram, the top- 
level component automatically simplifies to L (i.e., false). Thus, in this usage 
scenario, RCRS can be seen as a static analysis and behavioral type checking 
and inference tool for Simulink. 


6 Case Study 


We have used the RCRS toolset on several case studies, the most significant of 
which is a real-world benchmark provided by Toyota [8]. The benchmark con- 
sists of a set of Simulink diagrams modeling a Fuel Control System.' A typical 
diagram in the above suite contains 3 levels of hierarchy, 104 Simulink blocks 
in total (out of which 8 subsystems), and 101 wires (out of which 8 are feedbacks, 
the most complex composition operator in RCRS). Using the Translator on this 
diagram results in a . thy file of 1671 lines and 57037 characters. Translation time 
is negligible. The Analyzer simplifies this model to a top-level atomic STS com- 
ponent with no inputs, 7 (external) outputs and 14 state variables (note that all 
internal wires have been automatically eliminated in this top-level description). 
Simplification takes approximately 15 seconds and generates a formula which is 
8337 characters long. The formula is consistent (not false), which proves stati- 
cally that the original Simulink diagram has no incompatibilities. More details 
about the case study can be found in [5,6]. 


1 We downloaded the Simulink models from https://cps-vo.org/group/ARCH/ 
benchmarks. One of those models is made available in the figshare repository [7] 
— see also Section “Data Availability Statement”. 
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7 Data Availability Statement 


All results mentioned in this paper as well as in the extended version of this 
paper [6] are fully reproducible using the code, data, and instructions available 
in the figshare repository: https://doi.org/10.6084/m9.figshare.5900911.v1. 

The figshare repository contains the full implementation of the RCRS toolset, 
including the formalization of RCRS in Isabelle, the Analyzer, the RCRS 
Simulink library, and the Translator. The figshare repository also contains sam- 
ple Simulink models, including the Toyota model discussed in Sect. 6, a demo file 
named RCRS_Demo. thy, and detailed step-by-step instructions on how to conduct 
a demonstration and how to reproduce the results of this paper. Documentation 
on RCRS is also provided. 

The figshare repository provides a snapshot of RCRS as of February 2018. Fur- 
ther developments of RCRS will be reflected on the RCRS web page: http://rers. 
cs.aalto.fi/. 
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Abstract. We present TESTOR, a tool for on-the-fly conformance test 
case generation, guided by test purposes. Concretely, given a formal spec- 
ification of a system and a test purpose, TESTOR automatically gener- 
ates test cases, which assess using black box testing techniques the con- 
formance to the specification of a system under test. In this context, a test 
purpose describes the goal states to be reached by the test and enables 
one to indicate parts of the specification that should be ignored during 
the testing process. Compared to the existing tool TGV, TESTOR has a 
more modular architecture, based on generic graph transformation com- 
ponents, is capable of extracting a test case completely on the fly, and 
enables a more flexible expression of test purposes, taking advantage of 
the multiway rendezvous. TESTOR has been implemented on top of the 
CADP verification toolbox, evaluated on three published case-studies 
and more than 10000 examples taken from the non-regression test suites 
of CADP. 


1 Introduction 


Model-Based Testing [7] is a validation technique taking advantage of a model of 
a system (both, requirements and behavior) to automate the generation of rel- 
evant test cases. This technique is suitable for complex industrial systems, such 
as embedded systems [45] and automotive software [35]. Using formal models for 
testing is required for certification of safety-critical systems [36]. Conformance 
testing aims at extracting from a formal model of a system a set of test cases to 
assess whether an actual implementation of the system under test (SUT) is con- 
form to the model, using black-box testing techniques (i.e., without knowledge 
of the actual code of the SUT). This approach is particularly suited for nonde- 
terministic concurrent systems, where the behavior of the SUT can be observed 
and controlled by a tester only via dedicated interfaces, named points of control 
and observation. 

Often, the formal model is an IOLTS (Input/Output Labeled Transition Sys- 
tem), where transitions between states of the system are labeled with an action 
classified as input, output, or internal (i.e., unobservable, usually denoted by 7). 


* Institute of Engineering Univ. Grenoble Alpes. 
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In this setting, the most prominent conformance relation is input-output con- 
formance (ioco) [39,41]. The theory underlying ioco is well established, imple- 
mented in several tools [1,2,22,25,28], and still actively used, as witnessed by a 
series of recent case studies [9,10, 20,27,38]. 

As regards asynchronous systems, i.e., systems consisting of concurrent pro- 
cesses with message-passing communication, there exist two different approaches 
to model-based conformance testing: coverage-oriented approaches run the 
test(s) to stimulate the SUT until a coverage goal has been reached, whereas 
test purpose guided approaches use test suites, each test of which terminates 
with a verdict (passed, failed, or inconclusive). The generation of tests from the 
model can be carried out offline, before executing them against the SUT, or 
online [28] during their execution, by combining the exploration of the model 
and the interaction with the SUT. 

In this paper, we present TESTOR, a tool for on-the-fly conformance test case 
generation guided by test purposes, which, following the approach of TGV [25], 
characterize some state(s) of the model as accepting. The generated test cases 
are automata that attempt to drive a SUT towards these states. TESTOR 
extends the algorithms of TGV to extract test cases completely on the fly 
(i.e., during test case execution against the SUT), making TESTOR suitable 
for online testing. TESTOR is constructed following a modular architecture 
based on generic, recent, and optimized graph manipulation components. This 
also makes the description of test purposes more convenient, by replacing the 
specific synchronous product of TGV and taking advantage of the multiway 
rendezvous [18,23], a powerful primitive to express communication and synchro- 
nization among a set of distributed processes. TESTOR was built on top of the 
OPEN/CAESAR [15] generic environment for on-the-fly graph manipulation 
provided by the CADP [16] verification toolbox. 

The remainder of the paper is organized as follows. Section2 recalls the 
essential notions of the underlying theory. Section 3 presents the architecture, 
main algorithms, and implementation of TESTOR, and gives some examples. 
Section 4 describes various experiments to validate TESTOR and compare it 
to TGV. Section 5 compares TESTOR to existing test generation approaches. 
Finally, Sect. 6 gives some concluding remarks and future work directions. 


2 Background: Essential Definitions of [25] 


Conformance testing checks that a SUT behaves according to a formal reference 
model (M), which is used as an oracle. We use Input-Output Labelled Transition 
Systems (IOLTS) [25] to represent the behavior of the model M. We assume 
that the behavior of the SUT can also be represented as an IOLTS, even if it is 
unknown (the so-called testing hypothesis [25]). An IOLTS (Q, A, T, qo) consists 
of a set of states Q, a set of actions A, a transition relation T C Q x Ax Q, and 
an initial state qo E Q. The set of actions is partitioned in A = Ay U Ao U {r}, 
where Ay, Ao are the subsets of input and output actions, and 7 is the internal 


(unobservable) action. A transition (q1,b,q2) E€ T (also noted qı = q2) indicates 
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(a) model M 
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visible behaviour SP is, 
complete test graph CTG (gray), 
(b) test purpose TP and a test case TC (dark gray) 


Fig. 1. Example of test case selection (taken from [25]) 


that the system can move from state qı to state q2 by performing action b. Input 
(resp. output) actions are noted ?a (resp. !a). In the sequel, we consider the same 
running example as [25], whose IOLTS model M is shown on Fig. l(a). 

Input actions of the SUT are controllable by the environment, whereas output 
actions are only observable. Testing allows one to observe the execution traces 
of the SUT, and also to detect quiescence, i.e., the presence of deadlocks (states 
without successors), outputlocks (states without outgoing output actions), or 
livelocks (cycles of internal actions). The quiescence present in an IOLTS L 
(either the model M or the SUT) is modeled by a suspension automaton A(L), 
an IOLTS obtained from L by adding self-loops labeled by a special output 
action 6 on the quiescent states. The SUT conforms to the model M modulo 
the ioco relation [40] if after executing each trace of A(M), the suspension 
automaton A(SUT) exhibits only those outputs and quiescences that are allowed 
by the model. Since two sequences having the same observable actions (including 
quiescence) cannot be distinguished, the suspension automaton A(M) must be 
determinized before generating tests. 

The test generation technique of TGV is based upon test purposes, which 
allow one to guide the selection of test cases. A test purpose for a model M = 
(QM, AM, TM, gq) is a deterministic and complete (i.e., in each state all actions 
are accepted) IOLTS TP = (QT, ATP, TTP, qi P), with the same actions as the 
model ATP = AM, TP is equipped with two sets of trap states Accept? and 
Refuse’? , which are used to select desired behaviors and to cut the exploration 
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of M, respectively. In the TP shown on Fig. 1b, the desired behavior consists of an 
action !y followed by !z and is specified by the accepting state q3; notice that the 
occurrence of an action !z before a !y is forbidden by the refusal state q2. In a TP, 
a special transition of the form q Š q’ is an abbreviation for the complement set 
of all other outgoing transitions of q. These *-transitions facilitate the definition 
of a test purpose (which has to be a complete IOLTS) by avoiding the need to 
explicitly enumerate all possible actions for all states. Test purposes are used 
to mark the accepting and refusal states in the IOLTS of the model M. In 
TGV, this annotation is computed by a synchronous product [25, Definition 8] 
SP = M x TP. Notice that SP preserves all behaviors of the model M because 
TP is complete and the synchronous product takes into account the special *- 
transitions. When computing SP, TGV implicitly adds a self-looping *-transition 
to each state of the TP with an incomplete set of outgoing transitions. To keep 
only the visible behaviors and quiescence, SP is suspended and determinized, 
leading to SPs = det(A(SP)). Figure 1(c) shows an excerpt of SPyis limited to 
the first accepting and refusal states reachable from ao 

A test case is an IOLTS TC = (QTY, ATS, TTO, gF°) equipped with three 
sets of trap states Pass U Fail U Incone C QT® denoting verdicts. The actions 
of TC are partitioned into ATO and AJO subsets’. A test case TC must be 
controllable, meaning that in every state, no choice is allowed between two inputs 
or an input and an output (i.e., the tester must either inject a single input to the 
SUT, or accept all the outputs of the SUT). Intuitively, a TC denotes a set of 
traces containing visible actions and quiescence that should be executable by the 
SUT to assess its conformance with the model M and a test purpose TP.From 
every state of the TC, a verdict must be reachable: Pass indicates that TP 
has been fulfilled, Fail indicates that SUT does not conform to M, and Inconc 
indicates that correct behavior has been observed but TP cannot be fulfilled. An 
example of TC (dark gray states) is shown on Fig. 1(c). Pass verdicts correspond 
to accepting states (e.g., q11). Inconclusive verdicts correspond either to refusal 
states (e.g., q4 or gg) or to states from which no accepting state is reachable (e.g., 
state gio). Fail verdicts, not displayed on the figure, are reached from every state 
when the SUT exhibits an output action (or a quiescence) not specified in the 
TC (e.g., an action !z or a quiescence in state q1). 

In general, there are several test cases that can be generated from a given 
model and test purpose. The union of these test cases forms the Complete Test 
Graph (CTG), which is an IOLTS having the same characteristics as a TC 
except for controllability. Figure 1(c) shows the CTG (light and dark gray states) 
corresponding to M and TP, which is not controllable (e.g., in state q5 the 
two input actions ?a and ?b are possible). Formally, a CTG is the subgraph of 
SP„is induced by the states L2A (lead to accept) from which an accepting state 


' In TGV [25], the actions of test cases are symmetric w.r.t. those of the model M 
and the SUT, i.e., A57 C AM (TC emits only inputs of M) and ATS C AYT U {6} 
(TC captures outputs and quiescences of SUT). To avoid confusion, we consider here 
that inputs and outputs of TC are the same as those of M and SUT. 
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is reachable, decorated with pass and inconclusive verdicts. A controllable TC 
exists iff the CTG is not empty, i.e., on € L2A [25]. 

The execution of a TC against the SUT corresponds to a parallel composition 
TC || SUT with synchronization on common observable actions, verdicts being 
determined by the trap states reached by a maximal trace of TC || SUT, i.e., a 
trace leading to a verdict state. Quiescent livelock states (infinite sequences of 
internal actions in the SUT) are detected using timers, and lead to inconclusive 
verdicts. A TC may have cycles, in which case global timers are required to 
prevent infinite test executions. 


3 TESTOR 


We present the architecture and implementation of TESTOR, its on-the-fly algo- 
rithm for test-case extraction, and show several ways of specifying test purposes. 


3.1 Architecture 


TESTOR takes as input a formal model (M), a test purpose (TP), and a predi- 
cate specifying the input/output actions of M. Depending on the chosen options, 
it produces as output either a complete test graph (CTG), or a test case (TC) 
extracted on the fly. TESTOR has a modular component-based architecture con- 
sisting of several on-the-fly IOLTS transformation components, interconnected 
according to the architecture shown on Fig. 2. The boxes represent transforma- 
tion components and the arrows between them denote the implicit representa- 
tions (post functions) of IOLTSs. 

The first component produces the synchronous product (SP) between the 
model M and the test purpose TP. Following the conventions of TGV [25], 
the synchronous product supports *-transitions and implements the implicit 
addition of self-looping *-transitions. The next four reduction components pro- 
gressively transform SP into SP,i; = det(A(SP)) as follows: (i) t-compression 
produces the suspension automaton A(SP) by squeezing the strongly connected 
components of T-transitions and replacing them with 6-loops representing quies- 
cence; (ii) T-confluence eliminates redundant interleavings by giving priority to 
confluent 7-transitions, i.e., whose neighbor transitions (going out from the same 
source state) do not bring new observational behavior; (iii) t-closure computes 
the transitive reflexive closure on 7-transitions; (iv) the resulting 7-free IOLTS 
is determinized by applying the classical subset construction. The reduction by 
7-compression is necessary for t-confluence (which operates on IOLTSs without 
7-cycles) and is also useful as a preprocessing step for T-closure (whose algorithm 
is simpler in the absence of 7-cycles). Although 7-confluence is optional, it may 
reduce drastically the size of the IOLTS prior to 7-closure, therefore acting as 
an accelerator for the whole test selection procedure when SP contains large dia- 
monds of r-transitions produced by the interleavings of independent actions [31]. 
The first three reductions [31] are applied only if TESTOR detects the presence 
of 7-transitions in SP. 


216 L. Marsso et al. 


explorer 


reductors CTG 


T-compression T-closure ° 
synchronous T-confluence determinization Wag 
product 


solver i 
I/O actions 


Fig. 2. Architecture of TESTOR 


The determinization produces as output the post function of the IOLTS 
SPyis, whose states correspond to sets of states of the 7-free IOLTS produced 
by t-closure. SP,;, is processed by the explorer component, which builds the 
CTG or the TC by computing the corresponding subgraph whose states are 
contained in L2A. The reachability of accepting states is determined on the fly by 
evaluating the PDL [14] formula yi2q = (true*) accept on the states visited by the 
explorer, where the atomic proposition accept denotes the accepting states. This 
check is done by translating the verification problem into a Boolean equation 
system (BES) and solving it on the fly using a BES solver component [32]. The 
synchronous product and the explorer are the only components newly developed, 
all the other ones (represented in gray on Fig. 2) being already available in the 
libraries of the OPEN/CAESAR [15] environment of CADP. 


3.2 On-the-Fly Test Selection Algorithm 


We describe below the algorithm used by the explorer component to extract the 
CTG or a (controllable) TC from the SP,;, IOLTS on the fly. 

Basically, the CTG is the subgraph of SP.,, containing all states in L2A, 
extended with some states denoting verdicts. The accepting states (which are 
by definition part of L2A) correspond to pass verdicts. For every state q E€ L2A, 


the output transitions q 2 q with q’ ¢ L2A lead to inconclusive verdicts, and 
the output transitions other than those contained in SP,,;, lead to fail verdicts. 
To compute the CTG, the explorer component performs a forward traversal of 
SP.is; and keeps the states q E€ L2A, which satisfy the formula yj2_. The check 
q H Yiea is done by solving the variable X4 of the minimal fixed point BES 
{Xq = (q H accept) V on Xy} denoting the interpretation of Y2q on SP ys. 


The resolution is carried out on the fly using the algorithm for disjunctive BESs 
proposed in [32]. If the CTG is not empty (i.e., ae = Yi2a), then it contains 
at least one controllable TC [25]. 

The extraction of a TC uses a similar forward traversal as for generating the 


CTG, extended to ensure controllability, i.e., every state q of TC either has only 
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one outgoing input transition q g q' with q’ € L2A, or has all output transitions 
q 4 q” of SPs with q” € L2A. The essential ingredient for selecting the input 
transitions on the fly is the diagnostic generation for BESs [30], which provides, 
in addition to the Boolean value of a variable, also the minimal fragment (w.r.t. 
inclusion) of the BES illustrating the value of that variable. For a variable X, 
evaluated to true in the disjunctive BES underlying yı2a, the diagnostic (witness) 


m i bi ba bk i L ra i 

is a sequence X4 Xq, > ++: > Xq, where q, - accept. This induces a 
E b b bk ; f . 

sequence of transitions q > q 4 --- 3 qk in SP; leading to an accepting 


state. Since all states q, q1, -.-, qx also belong to L2A, this diagnostic sequence is 
naturally part of the TC under construction. 

More precisely, the TC extraction algorithm works as follows. If a 
Yi2a, the diagnostic sequence for qe is inserted in the TC (otherwise the 
algorithm stops because the CTG is empty). For the TC illustrated on Fig. 1(c), 


? ! ? ! ; 
this first diagnostic sequence is qo — qı uA q5 R qə Ž qıı. Then, the main loop 


consists in choosing an unexplored transition of the TC and processing it. 


— If it is an input transition q i q', nothing is done, since the target state 
q! € L2A by construction. Furthermore, the presence of this transition in 
the TC makes its source state q controllable. This is the case, e.g., for the 


transition go 3 qı in the TC shown on Fig. 1(c). 
— If it is an output transition q 2 q', each of its neighboring output transitions 


la’ 


q = q” is examined in turn. If the target state q” ¢ L2A, the transition is 
inserted in TC and q” is marked with an inconclusive verdict. This is the 


case, e.g., for the transition qı 2 qa in the TC on Fig. 1(c). If q” € L2A, 
the transition in inserted in the TC, together with the diagnostic sequence 


produced for q”. This is the case, e.g., for the transition qg a qs in the TC 
on Fig. l(c). 


The insertion of a diagnostic sequence in the TC stops when it meets a state q 
that already belongs to the TC, since by construction the TC already contains a 
sequence starting at q and leading to an accepting state. This is the case, e.g., for 
the diagnostic sequence starting at state gs in the TC on Fig. 1(c). In this way, 
the TC is built progressively by inserting the diagnostic sequences produced for 
each of the encountered states in L2A. 

During the forward traversal of SP,;;, the explorer component continuously 
interacts with the BES solver, which in turn triggers other forward explorations 
of SP,;, to evaluate yj2q. The repeated invocations of the solver have a cumulated 
linear complexity in the size of the BES (and hence, the size of SP,;,), because 
the BES solver keeps its context in memory and does not recompute already 
solved Boolean variables [32]. 


3.3 Implementation 


TESTOR is built upon the generic libraries of the OPEN/CAESAR [15] envi- 
ronment, in particular the on-the-fly reductions by t-compression, T-confluence 
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and r-closure [31], and the on-the-fly BES resolution [32]. The tool (available 
at http://convecs.inria.fr/software/testor) consists of 5022 lines of C and 1106 
lines of shell script. 


3.4 Examples of Different Ways to Express a Test Purpose 


Consider an asynchronous implementation of the DES (Data Encryption Stan- 
dard) [37]. In a nutshell, the DES is a block-cipher taking three inputs: a Boolean 
indicating whether encryption or decryption is requested, a 64-bit key, and a 
64-bit block of data. For each triple of inputs, the DES computes the 64-bit 
(de)crypted data, performing sixteen iterations of the same cipher function, each 
iteration with a different 48-bit subkey extracted from the 64-bit key. 

A natural TP for the DES is to search for a sequence corresponding to the 
encryption of a single data block, for instance 0x0123456789abcdef with key 
0x133457799bbcdff1, the expected result of which is 0x85e813540f0ab405. 
Using the LNT language [8,17], one would be tempted to write this TP as 
the process PURPOSE1, simply containing the desired sequence of three inputs (on 
gates CRYPT, KEY, and DATA) followed by an output (on gate OUTPUT): 


process PURPOSE1 [CRYPT: CB, KEY, DATA, OUTPUT: C64, T_ACCEPT: none] is 


CRYPT (true); —— input 

KEY (C_13345779_9bbcdff1); —— input 
DATA (C_01234567_89abcdef); —— input 
OUTPUT (C_85e81354_0f0ab405); —— output 


loop T_ACCEPT end loop 
end process 


Following the conventions of TGV, we mark accepting (respectively, refusal) 
states by a self-loop labeled with T_ACCEPT (respectively, T_REFUSE). 

However, PURPOSE1 is not complete: e.g., initially only one action out of 
the possible set {CRYPT (true), CRYPT (false), KEY (C_13345779_9bbcdff1), ...} 
is specified. Thus, when computing the synchronous product with the model, 
PURPOSE1 is implicitly completed by self-loops labeled with “*” (as in the TP 
shown on Fig. 1b), yielding a significantly more complex TC than expected. For 
instance, the implicit *-transition in the initial state allows the tester to per- 
form the sequence “CRYPT (false); CRYPT (true)” rather than the expected first 
action “CRYPT (true)”. To force the generation of a TC corresponding to the 
simple sequence, it is necessary to explicitly complete the TP with transitions 
to refusal states, as shown by the LNT process PURPOSE2, where gate OTHERWISE 
stands for the special label “*”: 


process PURPOSE2 [CRYPT: CB, KEY, DATA, OUTPUT: C64, SUBKEY: C48, 
T_ACCEPT, T_REFUSE, OTHERWISE: none] is 
select —— refuse any rendezvous but “CRYPT (TRUE)” 
CRYPT (true) 
[| OTHERWISE; loop T_REFUSE end loop 
end select; 
select —— refuse any rendezvous but “KEY (C_18845779_9BBCDFF1)” 


TESTOR: A Modular Tool for On-the-Fly Conformance Test 219 


KEY (C_13345779_9BBCDFF1) 
[| OTHERWISE; loop T_REFUSE end loop 
end select; 
loop L in 
select —— refuse any rendezvous but on gates DATA and SUBKEY 
DATA (C_01234567_89ABCDEF); break L 
[| SUBKEY (?any BIT48) 
[|] OTHERWISE; loop T_REFUSE end loop 
end select 


end loop; 
loop —— refuse any rendezvous but on gates OUTPUT and SUBKEY 
select —— test target is reached by a rendezvous on OUTPUT 


OUTPUT (C_85E81354_OF0AB405); loop T_ACCEPT end loop 
[| SUBKEY (?any BIT48) 
[|] OTHERWISE; loop T_REFUSE end loop 
end select 
end loop 
end process 


Instead of using the dedicated synchronous product, it is also possible to take 
advantage of the multiway rendezvous [18,23] to compositionally annotate the 
model, relying on the LNT operational semantics [8, Appendix B] to cut unde- 
sired branches. For instance, the same effect as the synchronous product with 
PURPOSE2 can be obtained by skipping the left-most component “synchronous 
product” of Fig. 2, i.e., feeding the 7-reduction steps with the IOLTS described 
by the following LNT parallel composition: 


par CRYPT, KEY, DATA, OUTPUT in 

DES [CRYPT, KEY, DATA, OUTPUT, SUBKEY] 
|| PURPOSE1 [CRYPT, KEY, DATA, OUTPUT, T_ACCEPT] 
end par 


This approach based on the multiway rendezvous even supports data han- 
dling. For instance, to observe the data (variable D), key (variable K), and whether 
an encryption or decryption is requested (variable C), and to verify the correct- 
ness of the result (in the rendezvous “OUTPUT (DES (C, K, D))”, DES denotes a 
function implementing the DES algorithm), one has just to replace in the above 
parallel composition the call to PURPOSE1 by a call to the process PURPOSES: 


process PURPOSE3 [CRYPT: CB, KEY, DATA, OUTPUT: C64, T_ACCEPT: none] is 
var C: BOOL, D, K: BIT64 in 
CRYPT (?C); 
KEY (?K); 
DATA (?D); 
OUTPUT (DES (C, K, D)); 
loop T_ACCEPT end loop 
end var 
end process 
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4 Experimental Evaluation 


TESTOR follows TGV’s implementation of the ioco-based testing theory [39, 
41], using the same IOLTS processing steps, adding only the r-confluence reduc- 
tion. For each step, TESTOR uses components developed, tested, and used in 
other tools for more than a decade. In this section, we focus on performance 
aspects and we compare TESTOR to TGV. For this purpose, we conducted sev- 
eral experiments with models and test purposes, both automatically generated 
and drawn from academic examples and realistic case studies. 

For assessing the correctness of TESTOR, we checked that each TC is 
included in the CTG, and we compared the TCs and CTGs generated by 
TESTOR to those generated by TGV. The latter comparison required several 
additional steps, automated using shell scripts and a dedicated tool (about 300 
lines of C code). First, we generated the LTS of each TP, applying appropriate 
renamings, because TGV expects the TP to be an explicit LTS, with accepting 
(resp. refusing) states marked by a self-looping transition labeled with ACCEPT 
(resp. REFUSE), and with the label “*”. Then, we modified the TC and CTG 
generated by TESTOR so that each label includes the information whether the 
label is an input or output, and which verdict state (if any) is reached by the 
corresponding transition. Using this approach, we found that the CTGs gener- 
ated by both tools were strongly bisimilar. The same does not hold for all the 
TCs, because the tools may ensure controllability in different ways, leading to 
non-bisimilar, but correct TCs. 

For each pair of model and TP, we measured the runtime and peak mem- 
ory usage of computing a TC or CTG (using TESTOR and TGV), excluding 
the fixed cost of compiling the LNT code (model and TP) and generating the 
executable. The experiments presented in this paper were carried out using the 
Grid’5000 testbed, supported by a scientific interest group hosted by Inria and 
including CNRS, RENATER and several Universities as well as other organi- 
zations (see https://www.grid5000.fr). Concretely, we used the petitprince 
cluster located in Luxembourg, consisting of sixteen machines, each equipped 
with 2 Intel Xeon E5-2630L CPUs, 32GB RAM, and running 64-bit Debian 
GNU/Linux 8 and CADP 2017-i. Each measurement corresponds to the average 
of ten executions. 


4.1 Test Purposes Taken from Case Studies 


Table 1 summarizes the results for some selected examples. The first two have 
been kindly provided by Alexander Graf-Brill, and correspond to initial versions 
of TPs for his EnergyBus model [20]; both aim at exhibiting a particular boot 
sequence, the second one using REFUSE transitions. The next four examples have 
been used by STMicroelectronics to verify a cache-coherence protocol [27]. The 
last three correspond to the three TPs presented in Sect.3.4 and check the 


TESTOR: A Modular Tool for On-the-Fly Conformance Test 221 


Table 1. Run-time performance for selected examples 


TESTOR TGV 

example test case CTG test case CTG 

time mem. time mem. time mem. time mem. 
EnergyBus 3 81 182 181 2 137 52 858 
EnergyBus (with REFUSE) 1 67 1 66 0 66 0 43 
ACE UniqueDirty 45 121 346 451 75 159 3047 643 
ACE SharedDirty 384 510 342 529 3821 746 3920 746 
ACE SharedClean 298 415 325 523 2820 628 3474 663 
ACE Data Inconsistency 24 116 580 711 24 142 6701 894 
DES (PURPOSE1) 22109 300 >1 week >43 GB >220 GB 
DES (PURPOSE2) 27344 332 27 86 24 6177 24 6176 
DES (PURPOSE3) 2 T74 4 100 not applicable 


Execution time is given in seconds and memory usage in MB. 


correctness of a simplified? version of the asynchronous implementation of the 
DES (Data Encryption Standard) [37]. These examples cover a large spectrum 
of characteristics: from no 7-transitions (ACE) to huge confluent T-components 
(DES), from few visible transitions (DES) to many outgoing visible transitions 
(EnergyBus), and a test selection more or less guided via refusal states. 

We observe that TESTOR requires less memory than TGV for all examples, 
but most significantly for the DES. However, although TESTOR is several orders 
of magnitude slower than TGV for the DES when using the synchronous product 
(TPs PURPOSE1 and PURPOSE2), TESTOR requires only two seconds to generate 
a TC or CTG when using an LNT parallel composition with the TP with data 
handling PURPOSE3. This is because the LNT parallel composition, handled by the 
LNT compiler, enables more aggressive optimizations. Thus, using LNT parallel 
composition to annotate the model’s accepting and refusal states is not only 
more convenient (thanks to the multiway rendezvous) and data aware, but also 
much more efficient — it is even possible to generate a TC for the original DES 
model (167 million states, 1.5 billion transitions) in less than 40 min. 

For the ACE examples, TESTOR is both faster and requires less memory than 
TGV. This is partly due to an optimization of TESTOR, which deactivates the 
various reductions of 7-transitions. For a fair comparison, we also run experiments 
forcing the execution of these reductions. For the extraction of a TC, this increases 
the execution time by a factor of two and the memory requirements by a factor of 
three. For the computation of a CTG, this increases the memory requirements by 
a factor of one and a half, without modifying the execution time significantly. 


? The S-boxes are executed sequentially rather than in parallel and the gate SUBKEY 
is left visible to separate the iterations of the DES algorithm and thus significantly 
reduce the size of 7-components. For the extraction of TC for PURPOSE2 from the full 
version of the DES, TESTOR would run for several weeks and TGV would require 
more than 700 GB of RAM. 
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4.2 Automatically Generated Test Purposes 


To evaluate the performance, we used a collection of 9791 LTSs with up to 
50 million transitions, taken from the non-regression test-base for CADP. For 
each LTS M of the collection, we automatically generated two TPs: one to test 
the reachability of an action and another to test the presence of an execution 
sequence. For the former TP, we sorted the actions of the LTS alphabetically, 
and checked the reachability of the first action, considering the second half of 
the action set as inputs. For the latter TP, we used the EXECUTOR tool? to 
extract a sequence of up to 1000 visible actions, which we transformed into a TP, 
considering all actions whose ranking is an odd number as inputs. Technically, 
this transformation consists in adding to each state of the sequence a self-loop 
labeled with 7 and a *-transition to a refusal state. 

From the generated pairs (M, TP) we eliminated those for which the auto- 
matic generation of a TP failed (for instance, due to special actions that would 
require particular treatment) and those for which the computation of a TC or 
CTG took too much time or required too much memory by either TESTOR 
or TGV. This led to a collection of 13,142 pairs (M, TP) for which both tools 
could extract a TC. For 12,654 of them, both tools also could compute the 
CTG. Figure 3 displays the results for each example, using logarithmic scales for 
both execution time and memory requirements, to make the differences for small 
values more visible. 

As for the case studies, we observe that TESTOR and TGV choose differ- 
ent tradeoffs between computation time and memory requirements. On average, 
TESTOR requires 0.3 times less memory and runs 1.3 (respectively 0.5) times 
faster to compute a TC (respectively the CTG). When considering only the 1005 
pairs with more than 500,000 transitions in the LTS, the average numbers show 
a larger difference. On average for these larger examples, to compute a CTG, 
TESTOR requires 1.4 times less memory, but runs 3.5 times longer; to compute 
a TC, TESTOR requires 2.7 times less memory and runs 0.7 times faster. 

Also, while both tools required the exclusion of examples due to excessive 
runtime, we excluded several examples due to insufficient memory for TGV, 
but not for TESTOR. Given that TCs are usually much smaller than CTGs, 
the on-the-fly extraction of a TC by TESTOR is generally faster and consumes 
less memory than the generation of the CTG. We also observed that the CTGs 
produced by TESTOR are sometimes smaller than (although strongly bisimilar 
to) those produced by TGV. 

While trying to understand these results in more detail, we found examples 
where each tool is one or two magnitudes faster or memory-efficient than the 
other. Indeed, the benefits of the different reductions applied in the tools depend 
heavily on the characteristics of the example, most notably the sizes of the 
various subgraphs explored (t-components, L2A). For instance, when the model 
M does not contain any 7-transition, there is no point in applying the reductions 
(7-compression, T-confluence, and r-closure). 


3 http://cadp.inria.fr/man/executor.html. 
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Fig. 3. Compared performance of TESTOR and TGV 


The modular architecture of TESTOR enabled us to easily experiment with 
variants of the algorithm used for solving the BES underlying Yioa. By default, 
when extracting a TC on the fly, we use the depth-first search (DFS) algorithm, 
which for disjunctive BESs stores only variables and not their dependencies 
(and hence only the states, and not the transitions of the model). Using the 
breadth-first search (BFS) algorithm of the solver produces smaller TCs, because 
it generates the shortest diagnostic sequences for states in L2A. However, this 
comes at the price of an increased execution time and memory consumption, a 
known phenomenon regarding BFS versus DFS algorithms [32]. Thus, one can 
choose between BFS or DFS resolution if the size of the TC extracted on the fly 
is judged more important or not than the resources required to compute it. 


5 Related Work 


Although model-based conformance testing has been intensively studied, there 
are only a few tools that use variants of the ioco conformance relation and that 
are still actively developed [4]. Other model-based tools for combinatorial and 
statistical testing, or white box testing are described in [43]. In the following, we 
compare TESTOR to the most closely related tools. 

TorX [42] and JTorX [2] are online test generation tools, equipped with a 
set of adapters to connect the tester to the SUT. The latest versions support 
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test purposes (TPs), but they are used differently than in TESTOR. Indeed, 
JTorX yields a two-dimensional verdict [3]: one dimension is the ioco correctness 
verdict (pass or fail), and the other dimension is an indication whether the test 
objective has been reached. This contrasts with TESTOR, which generates test 
cases (TCs) ensuring by construction that the execution stays inside the lead to 
accept states (L2A), and stopping the test execution as soon as possible with 
a verdict: fail if non-conformance has been detected, pass if an accepting state 
has been reached, or inconclusive if leaving L2A is unavoidable. 

Uppaal is a toolbox for the analysis of timed systems, modeled as timed 
automata extended with data. Three test generation tools exist for Uppaal timed 
automata. Uppaal-Tron [28] is an online test generation tool, taking as input a 
specification and an environment model, used to constrain the test generation. 
Uppaal-Tron is also equipped with a set of adapters to derive and execute the 
generated tests on the SUT. Contrary to TESTOR, the TCs generated from 
Uppaal-Tron can be irrelevant, because the generation is not guided by TPs. 
Uppaal-Cover [22] generates offline a comprehensive test suite from a deter- 
ministic Uppaal model and coverage criteria specified by observer automata. 
Uppaal-Cover attempts to build small test suite satisfying the coverage criteria, 
by selecting those TCs satisfying the largest parts of the coverage criteria. In 
contrast to TESTOR and Uppaal-Tron, Uppaal-Cover generates offline tests. 
Offline generation does not face the state-space explosion, but also limits the 
expressiveness of the specification language (e.g, nondeterministic models are 
not allowed). Uppaal-Yggdrasil [26] generates offline test suites for deterministic 
Uppaal models, using a three-step strategy to achieve good coverage: (i) a set of 
reachability formulas, (ii) random execution, and (iii) structural coverage of the 
transitions in the model. The guidance of the test generation by a temporal logic 
formula is similar to the use of a TP. However, the TPs supported by TESTOR 
(and TGV) can express more complex properties than reachability, and enable 
one to control the explored part of the model (using refusal states). 

On-the-fly test generation tools also exist for the synchronous dataflow lan- 
guage Lustre [21], e.g., Lutess [12], Lurette [24], and Gatel [29]. Contrary to 
TESTOR, these tools do not check the ioco relation, but randomly select TCs, 
satisfying constraints of an environment description and an oracle. 

In IOLTS, actions are monolithic, which does not fit for realistic models that 
involve data handling. STG (Symbolic Test Generator) [11] breaks the mono- 
lithic structure of actions, enabling access to the data values, and generates tests 
on the fly, handling data values symbolically. This enables more user-friendly 
TPs and more abstract TCs, because not all possible values have to be enu- 
merated. However, the complexity of symbolic computation is not negligible in 
practice. When using the LNT parallel composition, TESTOR can handle data 
(see example in Sect. 3.4) without the cost of symbolic computation, but still has 
to enumerate data explicitly when generating the TC. T-Uppaal [34] uses sym- 
bolic reachability analysis to generate tests on the fly and then simultaneously 
executes them on the SUT. The complexity of symbolic algorithms turns out to 
be expensive for online testing. 
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When executing a generated TC against a SUT, it is necessary to refine it 
to take into account the asynchronous communication between the SUT and 
the tester. Actually, the SUT accepts every input at any time, whereas the TC 
is deterministic, i.e., there is no choice between an input and an output. An 
approach for connecting a TC (randomly selected) and an asynchronous SUT 
was defined in [44]. A similar approach using TPs to guide the test generation 
was proposed in [5] and subsequently extended to timed automata [6]. Recently, 
this kind of connection was automated by the MOTEST tool [19]. 


6 Conclusion 


We presented TESTOR, a new tool for on-the-fly conformance test case genera- 
tion for asynchronous concurrent systems. Like the existing tool TGV, TESTOR 
was developed on top of the CADP toolbox [16] and brings several enhancements: 
online testing by generating (controllable) test cases completely on the fly; a more 
versatile description of test purposes using the LNT language; and a modular 
architecture involving generic graph manipulation components from the OPEN/- 
CAESAR environment [15]. The modularity of TESTOR simplifies maintenance 
and fine-tuning of graph manipulation components, e.g., by adding or remov- 
ing on-the-fly reductions, or by replacing the synchronous product. Besides the 
ability to perform online testing, the on-the-fly test selection algorithm some- 
times makes possible the extraction of test cases even when the generation of 
the complete test graph (CTG) is infeasible. 

The experiments we carried out on ten-thousands of benchmark examples 
and three industrial case studies show that TESTOR consumes less memory 
than TGV, which in turn is sometimes faster, for generating CTGs. We plan to 
experiment with state space caching techniques [33] and with other on-the-fly 
reductions to accelerate CTG generation in TESTOR. We also plan to investigate 
how to facilitate the description of test purposes, by deriving them from the 
action-based, branching-time temporal properties of the model (following the 
results of [13] in the state-based, linear-time setting) or by synthesizing them 
according to behavioral coverage criteria. 
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Abstract. Dynamic partial order reduction (DPOR) algorithms are used 
in stateless model checking (SMC) to combat the combinatorial explosion 
in the number of schedulings that need to be explored to guarantee sound- 
ness. The most effective of them, the Optimal DPOR algorithm, is optimal 
in the sense that it explores only one scheduling per Mazurkiewicz trace. 
In this paper, we enhance DPOR with the notion of observability, which 
makes dependencies between operations conditional on the existence of 
future operations, called observers. Observers naturally lead to a lazy con- 
struction of dependencies. This requires significant changes in the core of 
POR algorithms (and Optimal DPOR in particular), but also makes the 
resulting algorithm, Optimal DPOR with Observers, super-optimal in the 
sense that it explores exponentially less schedulings than Mazurkiewicz 
traces in some cases. We argue that observers come naturally in many con- 
currency models, and demonstrate the performance benefits that Optimal 
DPOR with Observers achieves in both an SMC tool for shared memory 
concurrency and a tool for concurrency via message passing, using both 
synthetic and actual programs as benchmarks. 


1 Introduction 


Testing and verification of concurrent programs is hard, as it requires reason- 
ing about all the ways in which operations executed by different processes (or 
threads) can interfere. Stateless model checking (SMC) [12] is a technique with 
low memory requirements that can be effective in finding concurrency errors or 
proving that a program cannot reach an error state by systematically explor- 
ing all the ways in which such operations can be interleaved. The technique 
requires taking control of the scheduler and subsequently executing the program 
multiple times, each time imposing a different scheduling of the processes. By 
considering every process at every execution step, however, the number of possi- 
ble schedulings grows exponentially w.r.t. the total length of program execution. 
Partial order reduction (POR) techniques [9,11,20,22] address this problem by 
prescribing the exploration of only a subset of schedulings, albeit a subset that 
© The Author(s) 2018 


D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 229-248, 2018. 
https: //doi.org/10.1007/978-3-319-89963-3_14 


230 S. Aronis et al. 


is sufficient to cover all behaviours. POR techniques take advantage of the fact 
that most pairs of operations by different processes in typical concurrent pro- 
grams are not interfering. As a result, a scheduling E that can be obtained from 
another scheduling E’ by swapping adjacent but non-interfering (independent) 
execution steps will make the program behave in exactly the same way as E’; 
such schedulings have the same partial order of interfering operations and belong 
to the same equivalence class, called a Mazurkiewicz trace [19]. It is sufficient for 
SMC algorithms to explore only one scheduling in each such equivalence class. 

POR algorithms operate by examining pairs of interfering operations. If it is 
possible to execute such operations in the reverse order, then their partial order 
will be different, and a scheduling from the relevant equivalence class must also 
be explored. For soundness, POR techniques need to be conservative, treating 
operations as interfering even in cases where they are not. Increasing the accu- 
racy of interference detection can therefore significantly improve the effectiveness 
of any POR technique. In early POR techniques, interference was determined 
statically, leading to over-approximations and limiting the achievable reduction. 
The efficiency of POR was later increased using semantic information to decide 
which operations interfere [13]. Dynamic Partial Order Reduction (DPOR) [10] 
further improved the effectiveness of POR algorithms by allowing interference 
to be determined from data obtained during the program’s execution. 

In this paper, we introduce the notion of observability of operations, allowing 
observer operations that appear later in a scheduling to be used when deciding 
whether earlier operations are interfering. We start by explaining observers with 
a series of examples (Sect. 2), and continue by presenting key notions of DPOR 
and explaining why using observers in DPOR algorithms is challenging (Sect. 3). 
We then present a formal framework (Sect. 4) and describe an extension to the 
Optimal DPOR algorithm [2] that enables use of observers (Sect.5). The exten- 
sion is generic in the sense that it can be applied to several models of concurrency, 
such as shared memory and message passing. We demonstrate this claim by two 
implementations: one in an SMC tool for C/C++ programs with pthreads and 
one in an SMC tool for Erlang programs (Sect. 6). Finally, in Sect. 7 we evaluate 
our implementations and show that Optimal DPOR with Observers can achieve 
significantly better reduction in both synthetic and ‘real’ programs. 


2 DPOR and Observers by Example 


Consider the program shown in Fig.1 in which a main process spawns two 
concurrent processes, p and q, which issue write operations on two different 
shared variables x and y. After p and q finish their execution, the main process 
reads the values of x and y and checks a correctness property. A DPOR algorithm 
will begin exploring this program by executing an arbitrary scheduling; see Fig. 1 
(middle). Nodes show the values of the shared variables and each transition 
consists of an execution step. By inspecting the operations in this scheduling, 
the algorithm sees that if the second step of q is scheduled before the second 
step of p, the partial order of the writes to the y variable is different. It therefore 
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Initially: x = y = 0 
spawn processes p and q; 


P 
x i= 1l; 
y := 1 


x i 
y: 


2; 
2 


join processes p and q; 


p:y:=1 


assert(abs(x - y) < 2) ; CD F QD 


Fig. 1. Writers program (its correctness property as assertion) and two of its 
schedulings. 


plans to execute a scheduling in which the second step of p happens after the 
one from q. The start of this scheduling can be denoted as p.q. Similarly, the 
order of the writes on x can be reversed, by executing q’s first step before the 
first step of p. Therefore, a scheduling starting with q should also be explored. In 
Optimal DPOR [2], future explorations are added as partial schedulings, forming 
wakeup trees (shown in blue). These trees are quite trivial in this example, each 
consisting of a single path. 

The algorithm continues exploration from the “deepest” point where a new 
scheduling should be tried; in the example, this is the (1,0) node. A second 
scheduling is explored with the intention to execute some operation before the 
second step of p. Without any other constraint, a non-optimal DPOR algorithm 
could execute p’s second step immediately after the first step of q, ending up 
in a state identical with the previously explored (2,1) and then again in (2,2). 
The sleep sets technique [11] can be used to avoid or stop such redundant explo- 
rations. Sleep sets retain information from already explored earlier process steps 
that have not yet been ‘overtaken’ by some step in the current exploration. In 
our example, information about p’s second step is retained in the sleep set until 
some other interfering operation (here q’s second step) has been executed. More- 
over, sleep sets can be used to infer that swapping (again) the second step of p 
and the second step of q (based on their interference in the second scheduling) is 
redundant. Any DPOR algorithm using sleep sets will explore four schedulings 
for this program (instead of the six ones possible). Each of these four schedulings 
leads to a different final state. Notice that two writes on the same variable were 
always deemed as interfering. 

Consider now the program shown on p q r 
the right. The shared variable x (whose , .2 4 | eee | asserta < 3) 
initial value is 0) is accessed by processes 
p,q and r. Here, the correctness property is checked by process r. If interference 
is decided using the same criteria as a data race (i.e., two operations interfere if 
they access the same memory location and at least one of them is a write), then 
all three operations interfere with each other. As a result, each of the 3! = 
possible interleavings has a different partial order and therefore belongs to a 
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different Mazurkiewicz trace that should be explored by a DPOR algorithm. In 
schedulings starting with r, however, the order of the execution of p and q is 
irrelevant (if one does not care about the final contents of the memory), as the 
values written by these operations will never be read. A DPOR algorithm could 
detect that the written values are not observed and consider the write operations 
as non-interfering. 

Taking this idea further, consider a next 
example, shown on the right. Here, N pro- mn || 2 E | l = cnx 
cesses write on the shared variable x, and as 
a result there exist N! schedulings. In each 
such scheduling, however, only the last writ- 
ten value will be read. A DPOR algorithm could consider write operations 
that are not subsequently observed as independent and therefore explore just 
N instead of N! schedulings, thereby achieving an exponential reduction. 

In the last two examples, better reduction could be obtained if the interfer- 
ence of write operations, which are conservatively considered as “always inter- 
fering”, was characterized more accurately by looking at complete executions 
and taking observability by “future” operations into account. This idea is appli- 
cable not only in shared memory but also in other models of concurrency. In 
the next message passing program, processes p and q each send a different mes- 
sage to the mailbox of process r using the send operator “!”. Process r uses 
a receive operation to retrieve a message and store it in a (local) variable x. 
If we assume that receive operations pick and 
return the oldest message in the mailbox or 
return null if no message exists, send opera- 
tions can interfere (the order of delivery is significant) and so can send and 
receive operations (an empty mailbox can yield a different value). As a result, 
six schedulings are possible. However, only three schedulings need to really be 
explored: the receive operation interferes only with the earliest send operation 
and cannot be affected by a later send; moreover, if the receive operation is 
executed first, the order of the send operations is irrelevant. 

If we instead assume that receive operations block if no matching message 
exists, only two schedulings need to be explored, as r can receive either Mı or 
M2. Again, if we generalize the example to N processes instead of just two, the 
behaviour is similar to the program with N writes: only N schedulings (instead 
of N!) are relevant, each determined by the first message delivered; the remaining 
message deliveries are not observable. Note that, in this concurrency model, we 
are interested in the observability of the first instead of the last operation in an 
execution sequence. 

In some message-passing concurrency models (e.g.,in Erlang programs [4]), 
it is further possible to use selective receive operations instead, which also 
block when no message can be selected. Using this feature, the previous pro- 
gram can be generalized and rewritten so that r is explicitly picking mes- 
sages in order, using pattern matching. Such a program is shown on the right. 
Here r wants to pick up the N messages in order: first Mı, then Mo, etc. 


join processes p1,p2,---,DN} 
assert (x > 0) 


p r 
r! M | r ! Mo | receive x 


Optimal Dynamic Partial Order Reduction with Observers 233 


Thus, the order of delivery 
of messages is irrelevant. A 
DPOR algorithm could take 
advantage of the additional 
information provided by the 
selective receive operations, 
notice that the messages from 
Dit1---pn cannot be selected before the message from p;, and therefore deter- 
mine that the N sends are independent. A single scheduling is enough to explore 
all behaviours of the program! 

Having explained the concept of observability of operations by examples, let 
us see how it can be combined with the Optimal DPOR algorithm and achieve 
such reductions. 


pı p2 Spk PN Tr 
r! M r ! Mo cael r ! Mn receive Mı; 
receive Mo; 


receive My 


3 Using Observers in a DPOR Algorithm 


Our objective is to construct a DPOR algorithm that lazily considers interfer- 
ences based on the existence of later operations, called observers. In the simplest 
case, operations that would be conservatively considered interfering are treated 
as independent in the absence of an observer. Examples in Sect. 2 included write 
operations whose values were never read, or cases where the order of message 
deliveries does not affect the order in which the messages are received. 

The intuition behind such an SMC approach comes from the fact that it is 
only operations that observe a value (e.g., assertions, receive statements, etc.) 
that can influence the control flow and lead to erroneous or generally unex- 
pected behaviour. Other operations (e.g., writes, sends, etc.) cannot affect pro- 
gram behaviour if no future operation observes their effects. In such cases, inter- 
ference between those other operations can be ignored. 


3.1 POR Concepts and Optimal DPOR 


The goal of POR techniques is the exploration of only a (small) subset of the 
possible schedulings of a concurrent program which is sound; that is, a subset 
that includes at least one scheduling from each Mazurkiewicz trace. DPOR algo- 
rithms perform a depth-first exploration of the tree of all possible schedulings. 
Reduction is achieved by exploring only a sound subset of all scheduling choices 
that are possible at each point in the tree. Such subsets are formed on the basis 
of two complementary techniques. 


— Each point in the tree is associated with a sleep set, which contains a set of 
processes whose exploration would be redundant. More precisely, a process p 
is in the sleep set after a sequence of form Ev if p has previously been explored 
after E, and furthermore p does not interfere with v. Thus, exploring F.v.p 
is redundant, since it was previously explored after E.p (as E.p.v). 
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— From each point in the tree, the set of explored processes must form a source 
set [2]. (Some DPOR algorithms employ persistent or stubborn sets, which are 
subsumed by source sets.) Source sets have the property that for any extension 
which forms a complete (aka maximal) scheduling, there is an equivalent 
extension in which the next step is taken by a process in the source set. A 
source set is constructed incrementally during the exploration by inspecting 
encountered races: whenever a scheduling of form E.p.v is explored, in which 
the step of p is in a race with some step in v, then the reversal of that race will 
be explored in some other scheduling, where some process q in v is scheduled 
immediately after FÆ: this is achieved by adding q to the source set after E. 


Most existing DPOR algorithms prescribe that from each point in the tree (i) all 
processes in a source set should be explored, and (ii) no process in the sleep set 
should be explored. However, these principles are not sufficient to avoid redun- 
dant exploration [2]. The reason is that the reversal of a race in E.p.v may 
happen only by exploring a particular subsequence of v; since a source set can 
only contain the first step in such a sequence, it can not prevent continued explo- 
ration beyond that first step from being redundant. Optimal DPOR improves 
on earlier techniques by using wakeup trees [2] in addition to sleep sets. Wakeup 
trees are composed of partial execution sequences (called wakeup sequences) 
that (a) reverse the order of the interfering operations, and (b) are provably 
non-redundant. Optimal DPOR, currently the state-of-the-art DPOR algorithm, 
always uses wakeup sequences to explore new schedulings. As a result, Optimal 
DPOR does not even initiate redundant exploration, and can achieve exponential 
reduction over e.g., the original [10] or the Source DPOR [2] algorithm. 


3.2 Observers and Sleep Sets 


The use of sleep sets is not trivial when using observers, because interference 
between events can often not be determined when they occur, but only later 
in the scheduling. Let us illustrate using an example. In the next program, 
three processes (p, q and s) send tagged messages (with tags A and B) to a 
receiver process 7, which uses selective receive to read matching messages from 
its mailbox. Each message also contains the process identifier of the sender. 


D q s r 
r ! {B,p}; |r ! {A,q}; | r ! {B,s}; || receive {A,x}; 
r ! {A,p} if (x == p) 


receive {B,y} 


In standard DPOR, the sends are interfering, since the order of delivery can 
affect the values assigned to the x and y variables in r. Using observers, sends 
are interfering only if justified by an observing receive operation. Assume that 
the first explored scheduling is p.p.q.s.r.r. Here, the second send by p (sending 
the message tagged with A) interferes with the send by q, since their order is 
observed by the first receive of r (if the message from q had been delivered first, 
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it would have been the one picked instead). Furthermore, the first send by p 
(sending the message tagged with B) interferes with the message send by s, since 
they have the second receive of r as observer. In order to explore the reversal of 
the race between the first send of p and that of s, the algorithm needs to explore 
a scheduling in which p’s first send is executed after s. Such a scheduling must 
clearly start with s. The rules for sleep sets prescribe that p should be in the 
sleep set at the start of this exploration, and that p should be removed from 
the sleep set after executing s if p and s interfere. However, this interference is 
visible only later, making it unclear what to do. On the one hand, removing p 
from the sleep set on the grounds that it “might” interfere with s risks to explore 
redundant schedulings and defeats the purpose of observers. On the other hand, 
keeping p in the sleep set and “see what happens” prevents exploring the effects 
of the race reversal, since that requires the second send of p to be explored 
before q, which is forbidden if p remains in the sleep set. Thus, sleep sets are 
not a sufficiently precise mechanism for avoiding redundant exploration without 
missing non-redundant schedulings. 


3.3 Introducing Observers to Optimal DPOR 


We will now explain how Optimal DPOR can be adapted to work with observers. 
There are two main challenges: (1) we need to address the fact that, in the 
presence of observers, interference is conditional, and (2) we also need a suitable 
replacement for sleep sets, since we can no longer use them to guarantee that 
there is no redundant exploration. 

In Optimal DPOR, it is assumed that operations that are interfering in some 
execution sequence remain interfering in any prefix of that sequence. This is 
no longer true when we determine interference by the existence of observing 
operations. If an observer is not included in a prefix of an execution sequence in 
which two operations were observably interfering, the same two operations will be 
independent. To address challenge 1 in Optimal DPOR with observers, we need 
to extend the wakeup sequences constructed for reversing the order of interfering 
operations that require an observer, with a suffix that includes the observer. It 
is allowed for this suffix to include operations happening after the interfering 
operations (even in program order); any such operations will behave identically 
in the reversal because in the original scheduling the observer was the first event 
that could be affected by the ordering of the interfering operations. To address 
challenge 2, we can build on the intuition behind sleep sets and assert that when 
our algorithm is done with a particular state, it has explored all schedulings that 
can start with the step that led to that state. When the algorithm considers a 
new scheduling (based on a wakeup sequence), information about observers in 
that scheduling needs to be recalculated from the operations in the sequence. 
The algorithm can then perform an exhaustive test, that ensures that each step 
previously explored from any point in the execution is overtaken by some other 
step in the wakeup sequence under consideration. 
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4 Framework 


We consider a concurrent system composed of a finite set of processes (or 
threads). Each process executes a deterministic program, in which statements 
act on the global state of the system. Processes can interact via shared variables, 
messages, etc. We assume that the state space does not contain cycles, and that 
executions have bounded length. A step of a process may not disable another 
process. 

Formally, let X be the set of states of a concurrent system and sg E€ X be the 
unique initial state. The partial function execute, : X > X describes execution, 
representing an atomic execution step of process p, which may depend on and 
affect the state. An execution sequence E of the system is a finite sequence of 
execution steps of its processes that is performed from the initial state. We use () 
to denote the empty sequence and . to denote concatenation of sequences of pro- 
cess steps (e.g., p.p.q denotes the execution sequence where first p performs two 
steps, followed by a step of q). The sequence of process steps in E also uniquely 
determine the state of the system after FE’, which is denoted s;,). For a state s, 
let enabled(s) denote the set of processes p that are enabled in s (i.e., for which 
execute,(s) is defined). If p € enabled(s;g)), then E.p is an execution sequence. 
A sequence E is maximal if enabled(s{z)) = 9, i.e., no process is enabled after F. 
An event (p,i) of E is a particular occurrence of a process in F, representing 
the i-th occurrence of process p in the execution sequence. We use w,w’,... to 
range over sequences, e,e’,... to range over events, as well as: 


— E F w to denote that E.w is an execution sequence. 

— w\p to denote the sequence w with its first occurrence of p removed. 

— dom(E) to denote the set of events (p, i) which are in E. 

— domp)(w) to denote dom(E.w)\dom(£), i.e., the events in E.w which are in w. 

— next p\(p) to denote domir) (p) as a special case. 

— € to denote the process p of an event e = (p, i). 

— e <p € to denote that e occurs before e’ in E, i.e.,<g is the total order of 
events. 

— E’ < E to denote that the sequence LE’ is a prefix of the sequence E. 


We assume a function which assigns a happens-before relation [15] to any 
execution sequence E, denoted as >p. 

We will keep the general approach of Optimal DPOR and require the 
happens-before relation to satisfy a set of properties, collected in Definition 1. 
These properties are the first point where we diverge from the underlying model 
for Optimal DPOR [2, Definition 3.2]. In that definition, Properties (3) and (5) 
need to be weakened, Property (6) needs to be replaced, whereas Property (7) 
was only required for Source DPOR and is thus dropped. 


Definition 1 (Properties of valid happens-before relations). A happens- 
before assignment, which assigns a unique happens-before relation >p to any 
execution sequence E, is valid if it satisfies the following properties for all exe- 
cution sequences E: 


BY 


5. 
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— Ff is an irreflexive partial order on dom(E), which is included in <p. 

The execution steps of each process are totally ordered, i.e., (p,i)—£(p,i+1) 
whenever (p,it+1) € dom(E). 

Given an execution sequence E and a process p s.t. E + p, then for all events 
e,e’ € dom(E), if ere’ then er pe’. 


. Any linearization E’ of +g on dom(E) is an execution sequence which has 


exactly the same “happens-before” relation —>p' as >g. This means that the 

relation —>p induces a set of equivalent execution sequences, all containing 

the same set of events, and with the same “happens-before” relation. We use: 

- E ~ E’ to denote that dom(E) = dom(E’) and that E and E’ are lin- 
earizations of the same “happens-before” relation, and 

- [E]~ to denote the equivalence class of E. 


If E ~ E", then enabled (sip) = enabled (sp7). 


For the last property, we need to introduce a few definitions. Given >p, if 


€, 


e' € dom(E) and e <p e’, define 


- e <p e (read as e is in a race with e’) to denote that e—>pge' and € # @ and 


there is no event e” € dom( E), different from e’ and e, such that e> ge” > ze’. 


— e [p € (read as e is in a reversible race with e') to denote that e <p e' and in 


any equivalent execution sequence E’ ~ E where e occurs immediately before 
e', eœ' is not blocked before the occurrence of e. 


Now we continue listing properties of valid happens-before relations. 


6. 


Given an execution sequence E, then for all events e,e' € dom(E) where 
e <p e, there exists a set O = observers(e,e', E) C dom(E) such that: 


(a) For allo € O, it holds that e> pgo, o# e', and o% pge’. 
or all o,0 € O it holds that o o. 
b) For allo,” € O it holds that o- po' 
(c) If E' ~ E then O' = observers(e,e’, E’) = O. 
(d) For every prefix E" < E of E such that e,e’ € dom(E’): 
— If O is empty, then e> p'e. 
- If O is nonempty, then e> p'e iff dom(E') NO #0. 
(e) If e Ze e’, then for all sequences w such that E F w and all events 
e” € dom(E): 
- If eż} pge”, then e£ pye”. 
- If e” ppe, then pue. 
(f) For all e” € dom(E) such that e'—pe” it holds that ON 
observers(e’,e”, E) = 0). 
(g) If O = {0} and E = E'.0 for some o and E', then for any E” ~ E", either 


e> pgr ge’ or e —> pr Ge. 
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We give some intuition for the changed properties. Property 3 requires the 
happens-before assignment to maintain edges in extensions, but allows having 
fewer edges in prefixes. Property 5 allows execution sequences that reach dif- 
ferent states (due to unobserved races) to be considered equivalent. Property 6 
summarizes properties for races that require observers. Most requirements are 
intuitive. Property 6.(d) clarifies Property 3: an “observed” race is included in 
a sequence only if some observers of the race are also included. Property 6.(e) 
prevents extensions to an execution sequence from adding edges to the events of 
a reversible race in such a way that the race can not be reversed. Property 6.(f) 
prohibits an observer from creating “dependency chains”. Finally, Property 6.(g) 
requires that an observer observes a fixed set of pairs of events in each execution 
sequence; a consequence of this is that whether or not some particular race is 
observed never depends on the ordering of some other pair of events observed by 
the same observer. All these properties are satisfied by “natural” happens-before 
assignments for events in message passing programs and most shared memory 
programs. Limitations include e.g., models in which the written memory regions 
of two write operations may overlap without being equal; such pairs of operations 
need to be treated as unconditionally racing. 


5 Optimal DPOR with Observers 


We now present a DPOR algorithm with observers that achieves optimal 
reduction. 

In Sect. 3.2 we explained why sleep sets are not suitable when observers are 
used. We instead introduce a notion of redundancy based solely on the set of 
explored steps from each state. We will base this notion on a concept defined in 
Optimal DPOR. 


Definition 2 (Initials and Weak Initials ([2]). For an execution sequence 
E.w, the set Ip(w) of processes that are initials and the set WI ig (w) of pro- 
cesses that are weak initials are defined as follows: 


1. p € helw) iff there is a sequence w such that E.w ~ E.p.w' 
2. p€ WIip\(w) iff there are sequences w and v such that E.w.v ~ E.p.w' 


Definition 3 (Redundant Sequences). For an execution sequence E and 
a function done from prefixes of E to sets of processes, the set of sequences 
redundant(E, done) is defined such that v € redundant(E, done) iff E.v is an 
execution sequence and there is a partitioning E = w.w' of E such that some 
process p € done(w) is also in p € WI iw (wv). 


The intuition is that if v € redundant(E, done), then the execution sequence 
E.v is equivalent to a previously explored execution sequence. In the special 
case where races do not need observers (i.e., the set of observers for each race 
is empty), we can define sleep sets in the classical sense by letting p € sleep( E) 
denote that E is of form E’.v for some v such that p € done(E’) and p and v 
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are independent. Then sleep(F) will consists of all single-process sequences in 
redundant(E, done), and v € redundant(E, done) is equivalent to sleep( E) N 
If E is an execution sequence, and v and w are sequences of processes, let: 


— v Cp) w denote that there is a sequence v’ such that E.v.v' and E.w are 
execution sequences with E.v.v' ~ E.w. Intuitively, v Eip) w if, after Æ, the 
sequence v is a possible way to start an execution that is equivalent to w. 

— v~jpgjw denote that there are sequences v’ and w’ such that E.v.v' and E.w.w! 
are execution sequences with E.v.v' ~ E.w.w’. Intuitively, v~jpjw if, after E, 
the sequence v is a possible way to start an execution that is equivalent to 
an execution sequence of form E’.w.w’, and vice versa. 


Let us define an ordered tree as a pair (B,~<), where B (the set of nodes) is 
a finite prefix-closed set of sequences of processes, with the empty sequence () 
being the root. The children of a node w, of form w.p for some set of processes 
p, are ordered by <. In (B,~<), such an ordering between children has been 
extended to the total order < on B by letting < be the induced post-order 
relation between the nodes in B. This means that if the children w.pı and w.p2 
are ordered as w.pı < w.p2, then w.pı < w.p2 < w in the induced post-order. 


Definition 4 (Wakeup Tree). Let E be an execution sequence, and done be 
a function from prefixes of E to sets of processes. A wakeup tree after (E, done) 
is an ordered tree (B,~<), such that the following properties hold 


1. No leaf w of B is redundant after E, i.e., w ¢ redundant(E, done); 
2. whenever u.p and u.w are nodes in B with u.p < u.w, and u.w is a leaf, then 


p Z WIip.4(w). 


Property (2) is the same as Optimal DPOR; Property (1) has been modified. 

Regarding inserting sequences in a wakeup tree, let (B, <) be a wakeup tree 
after (E, done). For any sequence w such that w ¢ redundant(E, done) we need 
an operation insert;g)(w, (B, <)) that satisfies the following properties: 


1. insertjp (w, (B, <)) is also a wakeup tree after (E, done), 
2. any leaf of (B, <) remains a leaf of insert; g)(w, (B, <)), and 
3. insertip)(w, (B, <)) contains a leaf u with u~;gyw. 


The insertiz\(w, (B,<)) operation can be implemented as follows. Let v be 
the smallest (w.r.t. to <) sequence in B such that v~jgw. If v is a leaf, 
insert p(w, (B, <)) can leave the tree unmodified. Otherwise, let w’ be a short- 
est sequence such that w Ep] v.w’, and add v.w’ as a new leaf, ordered after all 
already existing nodes in B of form v.w”. 


5.1 Algorithm 


Algorithm 1 is a modified and extended version of the plain Optimal DPOR 
algorithm [2], so that it supports observers. Since sleep sets is no longer an appli- 
cable mechanism for avoiding redundant exploration, the algorithm accepts only 


240 S. Aronis et al. 


two arguments, E, the prefix to explore, and WuT, the initial wakeup tree after 
E. It keeps two global variables, wut, a mapping from execution sequences to 
wakeup trees, and done, a mapping from execution sequences to sets of pro- 
cesses. For a pair of events e,e’ € dom(E) that are in a reversible race (e Xz e’) 
in E, the algorithm employs the following notation: 


— pre(E,e) denotes the prefix of E up to, but not including, the event e, 

— notdep(e, E) denotes the sub-sequence of E consisting of the events that occur 
after e but do not “happen after” e (i.e., the events e’ that occur after e such 
that e% pe’). 

— notobs(e,e’, E) denotes the sub-sequence of E containing the events that 
“happen after” e, but are not observers o E€ O = observers(e,e’,E) of the 
race eye’, nor “happen after” any such o: notobs(e,e’,E) = lq || q € 
E, e—>pgq, q Z O, Ao € O.0> pq). 


Algorithm 1. Optimal DPOR with Observers. 
Initial call: Explore((), ({()},0)) 


1 Explore(E,WuT) 

2 done(E) := 0; 

3 if enabled(sjz}) = 0 then // Race detection only at maximal execution sequences 
4 foreach e,e' € dom(E) such that (e Xz e’) do // For each racing pair e, e 
5 let E’ = pre(E,e); // Goto state before e 
6 if observers(e,e’,E) #0 then // Is e—+ze' an observed race? 

7 choose o € observers(e,e',E) ; // Select an arbitrary observer as a witness 

8 let v = notdep(e, E). 2. (norobs(e,e',E) \ a) : // Find events that don’t observe e—>ge' 
9 else // If e-+ge! are independently racing 
10 L let v = notdep(e,E).e'; // Find events independent with e 
11 if v ¢ redundant(E’ ,done) then // Has no equivalent already been explored? 
12 L wut(E') := insert (v, wut(E')); // If not, insert into the wakeup tree 
13 else // If not at a maximal execution sequence, explore... 
14 if WuT # ({()},0) then 

15 | wut(E) := WuT; // ... either using an existing wakeup tree 
16 else 

17 choose p € enabled(s\x)); // ... or by selecting an arbitrary p... 
18 | wut(E) := ({(), Pp}. {(p,())})s // ... and making a wakeup tree from it 
19 while 3p € wut(E) do // While the wakeup tree is not empty... 
20 let p = minz {p € wut(E)}; / ... pick next branch, ... 
21 let WuT = subtree(wut(E), p); // ... compute next wakeup tree (a subtree of the current),... 
22 Explore(E.p, WuT'); // ... and do a recursive call to Explore 
23 remove all sequences of form p.w from wut(E); // When done, cleanup... 
24 | add p to done(E); // ... and mark p as explored 


The first change compared to Optimal DPOR is in lines 6 to 8 which describe 
how to construct a wakeup sequence for an observed race, including an observer 
operation. Second, the test v € redundant(E, done) on lines 11 replaces the 
test sleep(E') 0 WIjgn(v) #0 at the corresponding place in Optimal DPOR. 
The rest of the algorithm is essentially the same, with initialization, update and 
propagation of sleep sets removed. 


Optimal Dynamic Partial Order Reduction with Observers 241 


5.2 Correctness and Optimality 


The correctness and optimality of Algorithm1 are stated in the following 
theorems. 


Theorem 1 (Correctness of Optimal DPOR with Observers). When- 
ever a call to Explore(E, WuT) returns during Algorithm 1, then for all maximal 
execution sequences E.w, the algorithm has explored some execution sequence E” 
which is in [E.w]~x. 


Since the initial call to the algorithm uses the arguments Explore((), 
({()},0)), Theorem1 implies that for all maximal execution sequences E the 
algorithm explores some execution sequence E’ which is in [E]~. 


Theorem 2 (Optimality of Optimal DPOR with Observers). 
Algorithm 1 never explores two maximal execution sequences which are equiv- 
alent. 


If Algorithm 1 is not at the end of a maximal sequence, it will continue 
exploring the scheduling either by using information from a wakeup tree (line 15) 
or by choosing an arbitrary enabled process (line 18). Theorem 2 ensures that 
all maximal execution sequences reached are non redundant. 


6 Implementations 


We have implemented Algorithm 1 in two SMC tools: Nidhugg and Concuerror. 


Observers in Nidhugg. Nidhugg [1] is a stateless model checking tool for 
shared-memory pthreads programs written in C or C++ that operates by inter- 
preting LLVM IR. Nidhugg can test programs also under relaxed memory models 
(TSO, PSO, and Power), but in this paper we will limit ourselves to testing pro- 
grams under Sequential Consistency. 

In the context of shared memory, the observers extension was used to make 
races between writes to the same memory location conditional on the existence 
of a read of that memory location that “observes” those writes. In order to 
add the observers extension to Nidhugg, the tool was first extended to support 
Optimal DPOR, as it previously only implemented Source DPOR, which is not 
easily extended with observers, as discussed in Sect. 3.2. The tool now records 
symbolic representations of program events that contain enough information 
to reconstruct the happens-before relation induced by a particular execution. 
For Source DPOR, these symbolic events are unnecessary if the happens-before 
relation is stored in vector clocks [18], as it is in Nidhugg. For Optimal DPOR, 
symbolic events are the most reasonable way to implement tests that check 
whether a given process is a weak initial of some sequence, which is needed for 
both the redundancy check and wakeup tree insertion. 

To extend this implementation with observers, symbolic events for writes 
were extended with an “observed”-flag, which is unset until a read that reads 
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the value written by that write is executed. At the end of the execution, we com- 
pute the vector clocks of the happens-before relation, only considering two write 
events to the same memory location as interfering if at least one of them has the 
“observed” -flag set. Then, Optimal DPOR was modified as described in Sect. 5.1. 
The check whether a wakeup sequence is redundant on line 11 is implemented 
using sleep sets extended with processes conditionally sleeping unless an address 
is read, and a set of addresses that must be read, without intervening writes, 
before the end of the program. 


Observers in Concuerror. Concuerror [8] is a stateless model checking tool 
for Erlang, a functional programming language based on the actor model of 
concurrency [4]. In Erlang, actors are realized by language-level processes imple- 
mented by the runtime system instead of being directly mapped to OS threads. 
Each Erlang process communicates with other processes via asynchronous mes- 
sage passing. Messages are placed in the mailbox of the receiving process in 
the order they are delivered. A process can consume messages using selective 
receive, which is a blocking operation when the mailbox does not contain any 
matching message, unless a timeout clause is specified. If multiple messages can 
match, the oldest message is picked from the mailbox. 

Concuerror already implemented Optimal DPOR, but treated any two mes- 
sage deliveries to the same mailbox as interfering. With the extension, Concuer- 
ror uses receives as observers of sends. When examining a complete scheduling, 
an extra pass is performed, annotating each message delivery event with the pat- 
terns that were used in the receive that picked the message (if present) and 
the receive order. If the message of a later delivery matches any of the pattern 
annotations of an earlier delivery, the deliveries interfere. The notobs sequence 
is constructed from all the events that lead up to the corresponding receive 
(which is the observer), excluding events in the notdep sequence. Because the 
resulting wakeup sequence contains fewer events, observer information is recom- 
puted, and then all the earlier done sets are checked for weak initials of the 
wakeup sequence, exactly as described in Algorithm 1. 


7 Experimental Results 


We report experimental results that compare the performance of two algorithms: 
Optimal DPOR (denoted in the tables as “optimal” ) and Optimal DPOR with 
Observers (denoted as “observers” ). We ran all benchmarks on a desktop with 
an i7-3770 CPU (3.40 GHz) with 16 GB of RAM running Debian 4.12.0-2-amd64 
and LLVM 3.8.1. The machine has four physical cores, but presently both tools 
use only one of them. 


Observers in Nidhugg. Table 1 shows the effect of observers on shared mem- 
ory C/pthread programs. We used two kinds of programs: (1) synthetic bench- 
marks similar to those of Sect.2, and (2) programs from SV-COMP and/or 
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from “similar” papers. We report the number of traces that the two algorithms 
explore, the time it takes to explore them, and the memory used (although this 
number is not interesting for an SMC tool). 


Table 1. Performance of Optimal DPOR vs. Optimal DPOR with Observers in Nid- 
hugg. 


Traces Explored Time Memory 
Benchmark optimal observers optimal observers optimal observers 
lastwrite(2) 2 2 <O.1s <0.ls 10MB 10MB 
lastwrite(7) 5040 7 0.5s <0.ls 10MB OMB 
lastwrite(8) 40320 8 525 <0.ls 10MB OMB 
lastwrite(9) 362880 9 52.0s <0.ls 10MB 10MB 
floating _read(2) 6 5 <O.1s <0.ls 10MB 10MB 
floating read(6) 5040 193 0.5s <0.ls 11MB 10MB 
floating read(7) 40320 449 50s <0.ls 11MB 1MB 
floating read(8) 362880 1025 53.3s 0.2s 11MB 1MB 
apr_1 1145 1145 4.8s 5.0s 19MB 20MB 
fib 218243 218243 18.9s 20.ls 11MB 1MB 
lamport(2) 16+16 14+12 <0.ls <0.ls 10MB OMB 
lamport(3) 9216+11525 5466+6132 4.0s 27s 11MB 11MB 


lastwrite(n). A synthetic program where n threads write to a shared variable x and a 
single process first joins (awaits the termination of) the writing threads, and then 
reads that variable. 

floating-read(n). A synthetic program where n threads write to a shared variable x and 
a single process reads that variable without waiting for the writing processes to 
exit. 

apr_1. A benchmark adapted from the sources of the Apache Portable Runtime library 
version 1.5.1. Also used in [1], there called apr_l.c. (Here, no loop bounding was 
applied.) 

fib. A benchmark from SV-COMP, also used in [1] where it was called fib_true.c. 

lamport(n). This standard benchmark has n worker processes acquiring a mutex imple- 
mented by Lamport’s second fast mutual exclusion protocol [16] and immediately 
releasing it. We show it for 2 and 3 processes which are the only sizes that are both 
non-trivial and tractable. 


For lastwrite(n), we see a reduction in the number of interleavings explored 
from n! to n, as explained in Sect.2. For floating-read(n), optimal shows the 
predicted (n + 1)! interleavings, and for n = 2, observers reduce the interleaving 
count from 6 to 5 as expected. In general, the benchmark has n x 27! + 1 
interleavings with observers. Notice that any technique that differentiates equiv- 
alence classes by the partial order of program steps must explore at least as 
many interleavings or violate Property 4. The next two programs (apr-1 and fib) 
are examples of programs for which observers have no effect. We see that the 
extra overhead is very moderate for both programs. 

In the last benchmark (lamport), we see that observers improve performance. 
As Nidhugg does not implement await statements (which are used by lamport), 
it emulates these with assumes. In such cases, Nidhugg might explore some traces 
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in which these assumptions are violated. We list those traces separately, so for 
this benchmark the “Traces Explored” columns show a+ b entries, which means 
that Nidhugg explored a + b traces but b of those times an assume statement 
was violated. 


Observers in Concuerror. Table2 shows the effect of observers in message 
passing programs; we omit memory used, as both algorithms have similar require- 
ments. 


Table 2. Comparison of Optimal DPOR vs. Optimal DPOR with Observers in 
Concuerror. 


Traces Explored Time Traces Explored Time 
Benchmark optimal observers optimal observers Benchmark optimal observers optimal observers 
not_ selective(2) 2 2 <1.0s <1.0s lock(3) 30 6 09s 0.9s 
not_ selective(6) 720 720 17s 1.8s lock(4) 336 24 14s 0.9s 
not_selective(7) 5040 5040  6.0s 6.8s lock(5) 5040 120 90s 1.3s 
not_selective(8) 40320 40320 48.0s 56.0s lock(6) 95040 720 3m27s 2.68 
selective(2) 2 1 <t1.0s <1.0s  poolboy 746 265  6.6s 4.0s 
selective(6) 720 1 1.8s <1.0s J 
selective(7) 5040 1 63s  <l.0s SProc 1168 784 127s 100s 
selective(8) 40320 1 510s <1.0s corfu-repair 92750496 3864604 1022h 52h 


not_selective(n). n processes send messages to a process, that can receive any message 
sent to it. 

selective(n). This is a generalized version of the last example of Sect. 2. A process uses 
pattern matching to choose between messages from n different senders. 

lock(n). This is a program in which n workers acquire and release a lock simulated by 
an Erlang process. When using observers, it has n! schedulings. Without observers 
the number of schedulings is higher. 

poolboy. A benchmark created from a unit test of a worker pool library [2]. 

gproc. A benchmark created from a library implementing an extended process dictio- 
nary [2]. 

corfu-repair. A program that verifies the correctness of a repair protocol of CORFU, a 
distributed database using a variant of Chain Replication. From a paper [5] that 
motivated our work. 


The two benchmarks on the left sub-table confirm the behaviour we expect. 
When receives are not selective, the number of traces explored by both algo- 
rithms is n!. With selective receive (selective benchmark) observers explore only 
one trace. 

The first program on the right sub-table (lock) is originally a shared-memory 
program that when translated to Erlang simulate locks using message passing. 
To acquire the lock, a process sends a message with its identifier to the “lock 
process” and then waits for a reply. Upon receiving the acquire message, the 
lock process uses the identifier to reply and then waits for a release message. 
Other acquire messages become queued in the mailbox of the lock process. 
Upon receiving the release message, the lock process loops back to the start, 
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retrieving the next acquire message and notifying the next process. Notice that, 
without observers, the delivery of the release message of a process interferes 
(redundantly) with the delivery of acquire messages of other processes, unlike 
acquire operations on true locks which cannot be executed before a release oper- 
ation (such messages were treated exceptionally in the evaluation of Optimal 
DPOR). Observers remove the need for special handling: the receive statements 
are enough to precisely determine which pairs of send operations are interfering. 

The next two table rows (poolboy and gproc) show results from “real” Erlang 
programs. We see that observers provide a moderate reduction in both the num- 
ber of traces that need to be explored as well as in time. 

Finally, the last program (corfu-repair) is the one that triggered this work. As 
can be seen in the table, observers allow Concuerror to complete SMC in a bit 
more than two days, while without observers the tool needs to explore exactly 
24 times as many traces, taking more than 42 days to finish. 


8 Related Work 


POR techniques have been continuously evolving w.r.t. how they determine 
interference. Refining the conditions under which higher-level operations inter- 
fere has been shown to have significant impact, regardless of whether the states in 
which such operations are executed is also a parameter or not [13]. In this work, 
we have extended this idea, parameterizing the interference between operations 
using distinct observer operations. 

DPOR techniques have also been extended to take advantage of special prop- 
erties of the underlying concurrency model. For the actor model, the transitivity 
of the dependency relation for send operations has been exploited to defer early 
planning of interleavings [21]. This improvement is orthogonal with Optimal 
DPOR (and with our extension), as it reduces the number of wakeup sequences 
that are added “early” in an exploration. For event-driven systems, it has been 
shown [17] that two post operations to an event dispatch queue need not be 
considered dependent: reordering of such operations can be decided later, upon 
detection of interference between other operations within the respective event 
handlers. However, this treatment applies only under a specific interpretation 
of ‘message passing’ that exploits additional semantic structure of an actor’s 
mailbox. Our technique is applicable to a wider spectrum of programs. 

Context-Sensitive DPOR [3] uses an external procedure to decide whether 
alternative schedulings would lead to identical states and, like optimal DPOR 
with observer, is also able to achieve exponential reduction in certain cases. 
However, since it needs to compare states, it is an inherently stateful technique, 
in contrast to our technique that inspects only one trace at a time to lazily 
construct reversible races. 

Data-Centric DPOR (DC-DPOR) [7] is an SMC technique that explores 
a related but different notion of observability. It defines two executions to be 
equivalent if each read reads from (“observes”) the same write in both executions. 
In contrast, our notion of observability is based on observing interference of 
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operations, not just individual writes. DC-DPOR’s resulting equivalence relation 
is coarser than ours, which is based on Mazurkiewicz traces. However, DC-DPOR 
is optimal only for programs with acyclic communication graphs, while being 
non-optimal otherwise. Also, DC-DPOR models message passing using locks 
and shared memory, which at best gives as few traces as Optimal DPOR gives 
without the improvements presented in this paper. 


9 Concluding Remarks 


In this paper we presented an extension to the Optimal DPOR algorithm for 
SMC that uses observability to refine which operations are considered as inter- 
fering. We described the challenges and motivated the necessary modifications, 
gave a formal description of the algorithm and the theory behind it and reported 
on two implementations in SMC tools, demonstrating that Optimal DPOR with 
Observers can achieve significantly better reduction in both shared memory and 
message passing programs. 
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available in the Figshare repository [6]. Also included in the artifact are instructions 
on how to use it to reproduce the results reported in this paper. As per the TACAS 
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Abstract. Data flow analysis (DFA) is an important verification tech- 
nique that computes the effect of data values propagating over program 
paths. While more precise than flow-insensitive analyses, such an analy- 
sis is time-consuming. 

This paper investigates the acceleration of DFA by structural decom- 
position of the underlying control flow graph. Specifically, we explore the 
cost and effectiveness of dividing program paths into subsets by parti- 
tioning path suffixes at conditional statements, applying a DFA on each 
subset, and then combining the resulting invariants. This yields a family 
of independent DFA problems that are solved in parallel and where the 
partial results of each problem represent safe program invariants. 

Empirical evaluations reveal that depending on the DFA type and 
its conditional implementation the invariants for a large fraction of pro- 
gram points can be computed in less time than traditional DFA. This 
work suggests a strategy for an “anytime DFA” algorithm: computing 
safe program invariants as the analysis proceeds. 


1 Introduction 


Software developers use static analyses as a supplement to traditional dynamic 
testing approaches. Tools such as AbsInt Astrée [1], Facebook Infer [2], and 
MathWorks Polyspace! are becoming standard parts of development workflows. 
Advances in program analysis and theorem proving have helped static program 
analysis become more feasible for verification of general-purpose software. 

The power of static analysis to consider all program behaviors follows from its 
ability to safely over-approximate program behaviors by abstracting the concrete 
domain of program variables and the programming language semantics. But at 
the same time its over-approximating nature causes static analysis to identify 
some property violations as uncertain. The reason for this uncertainty is that 
a static analysis cannot tell if a violation happens on a feasible or an infeasi- 
ble, i.e., strictly over-approximating, program behavior. This inconclusiveness is 


1 http: //www.mathworks.com/products/polyspace.html. 
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unacceptable since each potential violation must be examined further. An auto- 
matic solution to the elimination of false positive violations is to increase the 
precision of a static analysis, i.e., improve the analysis so it considers fewer 
infeasible behaviors. 

However, improving analysis precision generally increases analysis cost in 
terms of running time and memory consumption. A common approach to address 
this problem is to decompose the program’s state space into several subspaces 
and perform analysis on each separately. What distinguishes those techniques 
are the underlying decomposition methods. 

One approach focuses on making a precise static analysis scalable by decom- 
posing a large program into modules like procedures and classes, and allowing 
the analysis to examine each partition independently. Next, the analyzed infor- 
mation of each module is composed together to obtain the result of the whole 
program analysis. In the literature [3,4] this method is referred to as partial 
static analysis. 

Another approach aims to improve the scalability of precise analysis by per- 
mitting the analysis to explore only those program states for which it is ade- 
quately precise, i.e., able to provide definitive result. In the literature [5-7] this 
approach is called conditional static analysis (CSA) since the permitted states 
are described by a condition 0 expressed as a logical formula. In such a frame- 
work an analysis verifies a program under some assumptions, i.e., there are no 
null pointer exceptions or a pre-condition on input values is assumed to hold. 
Next, another analysis attempts to prove these assumptions by showing that the 
states, which do not satisfy 0 are either not reachable or do not lead to property 
violations. In prior work the condition @ is either determined from the analysis 
design [5,6], where @ is applicable to all program states, or determined during 
program analysis execution [7], where 6 is composed of the conditions assumed 
to hold for a certain set of states. 

While previous work on CSA focuses on finding values of 6 that ensure an 
increase in analysis precision, in this paper we explore the decomposition of 
the program’s state space in order to improve the efficiency of the analysis. We 
decompose the program’s state space based on the program’s control flow graph 
(CFG), i.e., on the program’s structural information. Each partition corresponds 
to a set of paths expressed as a set of CFG branches 7. This permits a path, or 
m, defined CSA to compute invariants for each m independently and in parallel. 
While one can use a logical formula 0 as a precondition to restrict program 
input values to those that follow a particular path, we conjecture two primary 
advantages of structural decomposition. First, 7 is expressed directly as a subset 
of CFG branches and computing an equivalent 0, expressing constraints on input 
values, would require complex value propagating analyses. Second, because 7 
is structural its effect on the analysis is independent of the abstract domain, 
whereas even an equivalent 0 may not be effective in preventing values from 
flowing along a branch due to over-approximation by the abstract domain. 


Conditional Data-Flow Analysis 251 


The contributions of this paper are presentation of: 


1. A formalization of the path-define CSA as a data-flow framework. 
2. Two algorithms for implementing CSA in existing analysis frameworks. 
3. An approach to efficiently partition CFG paths for path-defined CSA. 


In the next section, we provide an overview of the structural CSA approach 
and pose our research questions. After that we formalize CSA in Sect.3 and 
demonstrate in Sect. 4 two different ways of implementing CSA in an existing 
program analysis framework. In Sect. 5 we present our approach to partitioning 
a CFG. Then we present our experiments and discuss related work. 


2 Overview 


We begin with an example of a traditional data-flow analysis. Data-flow analysis 
calculates some information for each point in a program based on the program 
structure and the language semantics. The calculated facts, i.e., program invari- 
ants, are then later used to reason about program properties, usually safety prop- 
erties, which must hold on all feasible program executions. Data-flow analyses 
that compute invariants that are satisfied by all paths are called must analyses. 
In our example we show how a data-flow analysis computes invariants for each 
program statement. 

Consider a program and its corresponding CFG in Fig. 1(a). In this example x 
is an integer variable. The edges of the CFG are labeled with T for true branches 
and F for false branches of the conditional statement. 

In order to calculate invariants static analysis (SA) works with abstract val- 
ues of x, which are composed of the elements of an abstract domain. For example, 
the signs abstract domain has three elements {+,0,—}. 0 denotes the singleton 
set {0} of concrete values, + denotes positive values, and — denotes negative val- 
ues. If SA employs the signs abstract domain then the values of x are expressed 
as a set containing any of those three elements, including special cases {} = L 
for no values and {+,0,—} = T for all values 

SA starts by assigning x to T at the CFG’s entry point, since x can have 
any concrete value. Upon encountering the conditional statement SA computes 
invariants for x along the true branch, then along the false branch, and then 
merges these values before the return statement. The left CFG in Fig. 1 shows 
the result of the analysis where the CFG’s edges are annotated with computed 
invariants for x. Clearly, the computations along these two branches are inde- 
pendent of each other and could be done simultaneously, thus reducing the com- 
putational time. This observation is the main idea behind our approach. 

In other parallel SA approaches that we discuss in Sect.7, the parallel com- 
putation is done inside a full SA. During the computation a parallel SA waits 
at the merge point, where the analysis combines the results of the two branches, 
on the completion of each branch before proceeding further thereby reducing 
parallelism. 
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4 = 0; x>0 x>0 x>0 x>0 
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x=-2*x x=0 x=-2*x x=0 x=-2*x x=0 x=-2*x x=0 


} 
return x; \/ ae a wa nwa 
l 


} (0.43 Jo Ji 
return x return x return x return x 
(a) (b) (c) (a) 


Fig. 1. Source code and its CFG (a); analysis examples: signs analysis result (b), CSA 
sign analysis result for set paths with 1t prefix (c) and 1f prefix (d) 


Moreover, if we can analyze the true and the false branches independently 
then the invariants computed along the true branch could be accessed even 
sooner for a user to process. This observation is another inspiration for design- 
ing “anytime DFA”, which provides a sound information about some program’s 
invariants. 

As mentioned, in general, it would be difficult to compute a precondition 
0 that restricts the input values of x to only those that would take the CSA 
computation to a particular set of branches. However, in path-defined CSA those 
branches can be stated explicitly. In our example we can have two set of paths: 
one defined by mı = {1t}, i.e., take the true branch of the first conditional 
statement only, and 72 = {1f}, i.e., take the false branch of the first conditional 
statement. The results of these two path-defined CSA are in (c) and (d) in Fig. 1, 
respectively. We can see that the union of the abstract element sets for mı CSA 
and m2 CSA on the corresponding edges results in the same invariants of the full 
analysis, that is CSA produces sound results. Section 3 formalizes the conditions 
under which soundness holds in CSA. Overall CSA can potentially provide two 
main benefits to a user: (1) the speedup of the analysis using parallelism and (2) 
delivering fast useful feedback to users. 

One of the objectives of our work is to investigate the efficiency of two m- 
defined CSA implementations in an existing data-flow framework and its ability 
to compute sound invariants at intermediate points in the analysis. 

To evaluate efficiency improvements we consider a traditional reaching def- 
initions (RD) analysis and value-based data flow analysis (VB) for disjoint 
domains [8] similar to one used in the above example. Our approach automati- 
cally generates a set of 7 for each method based on heuristics discussed in Sect. 5. 
Then based on 7 it recombines CSA in the order of its completion and then com- 
pares the result of each combination step to the results of the full SA. Through 
our experiments we aim to answer the following research questions: 


1. Does path-defined CSA compute sound invariants faster than SA? 
2. At what rate does CSA compute sound invariants? 
3. How efficient are the two implementations of CSA? 
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We answer these research questions through an extensive empirical evaluation 
on real-world programs. 


3 Conditional Analysis 


In this section we first present the traditional monotone framework for data flow 
analysis followed by the discussion of the necessary changes that extend it to 
a conditional data flow framework. This section also outlines the approach of 
composing the unconditional result from conditional ones. 

We use the data flow analysis framework similar to one presented in [9] for 
an analysis A, only we extended it to express branch-sensitive analysis, where 
the outgoing flow of a statement l € CFGp is defined for each of its outgoing 
edges (l, l’) € CFGp. Thus, the following parameters define A. 


— The complete lattice D4 that describes the abstract domain of A. 

— CFGp for a program P. 

— A set of monotone transfer functions F4 for each statement (l,l) € CFGp 
that maps an element of D4 to itself, i.e., fw E€ Fa : Dare Da. 

— Entry statements E in CFGp. 

— An initial value ¿ € D4 for statements in E. 


Then the set of equations for forward A is defined as follows on entry and 
exit of each statement 1 € CFG p: 


Ain(l) = | KAoull,1) | WD € CFGp} Ute (1) 
where uw, = 4 ' ares 
eo i te 


Aout (1, l’) = fi (Ain(D), (LU) € CFGp 


where L is the least upper bound operator, L is the bottom element of D4 for 
which Yd € Da: LU d = d and V(I,l’) € CFGp : fu(L) = L. For safety, L 
corresponds to the empty set of concrete values and T to the set containing all 
concrete values. The value of ų¿ is assigned to T, i.e., the analysis considers all 
possible input values for a program. The solution of the above set of equations 
provides the result of the analysis for P. 

In our work we express a condition for DFA as a condition that identifies 
the set of paths to be analyzed m, which defines a CFG partition. We describe 
CSA as a special case of A, which we denote as A”. Thus, a traditional data 
flow analysis A = A“); unspecified branches in 7 are explored fully. For our 
formulation of CSA, the edges in 7 are not nested inside a loop. 

We have chosen m to be represented by the set of branch edges in CF'Gp, 
at most one for each conditional statement l, which the analysis must include 
while excluding their counterparts. If l has l’ and Į” as its true and false targets, 
respectively, then 7 can contain the edge (l,l), or the edge (l, l”), or none of 
them. To capture the relation between the opposite branches of | we designate 
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(1,1) = 7(1,1) and vice versa (l, 1”) = 7(1,l’). If (,U’) € m then the values of all 
variables x; incoming to the target of its opposite edge l”, i.e., along edge —(I, 1’), 
are set to L. For brevity, we denote such case, i.e., when Vi: z; = L, as L state. 
Those L values of the infeasible edges are propagated further to its children 
making them excluded from the analysis. The same principle applies when the 
opposite target (l, l”) € m. When none of the edges are present in m then the 
analysis treats them in its usual manner, i.e., propagates the information through 
both branches. 

With these path-based conditions we can now write the set of equations for 
conditional data flow framework for an analysis A”: 


An D = LAs’) | U1) E CFGp} U th (2) 


h 1 JT ifleEeE 
where ig = 4 | iflg E 


= r= [Ie ARO) if (UU) € CFGp and (LU) ¢ x 
outi t) = L if (l, V) € CFGp and -(1,l') € 7 


Let IT be the set of path-based conditions for an analysis A. Executing A 
with different conditions 7; € I produces a set of conditional analysis A*’. The 
solution for an l € CFG p over IT can be expressed as the meet over all maximal 
fixed point computations (MFP) produced by each A™, which, when equal to 
the MFP for A, means that SA and CSA produce the same results. 


| | MFP4 (I) = MFP,(I) (3) 
njell 


Since SA performs the computation over all program execution paths then 
in order for CSA to be sound it must ensure the same. For example consider 
two conditions {(1,1’)} and {=(1,’)}. The conditional analysis At?) analyzes 
all possible input values for the set of paths containing the true branch of l 
while At!} does it for the set of paths containing the false branch of l. 
Thus, together ALF and AIG} analyze all program paths. To formalize 
the soundness of CSA, we express 7 as a boolean function gy as follows. 

Each true edge in CF'Gp is mapped to a boolean variable x; and each false 
edge is mapped to -2;. Then edges in 7 are mapped to a set of literals and g, is 
expressed as a conjunction of those literals. In our example if (l, l’) is mapped to 
x, then gy(yy7)} = x1 and gy—”q,y")} = 741. The union of these two sets of paths 
is equivalent to the disjunction of gq) and go). Thus, the combination of 
arbitrary 7 and 7 is given as gz, V Gro = T1 U T2. 

IT yields a sound CSA if Viaje q Jz; is a tautology. To maximize efficiency of 
CSA 7a should be pairwise disjoint — thereby eliminating duplicate computation. 


Yri, Tj € H and Ti $ Tj : Gx; \ Gn; = false 


Therefore in order for the analysis to be sound and efficient the partition algo- 
rithm should generate partitions of I that satisfy these two constraints. We 
discuss our partitioning algorithm in Sect. 5. 


Conditional Data-Flow Analysis 255 


Algorithm 1. A branch-sensitive Algorithm 2. A quasi-topological 


work-list algorithm for a CFG order for a CFG 

1: w — quasiTopOrder(CFG) 1: quasiTopOrder(C FG) 
2: while sw.isEmpty() do 2: N — |CFG| 

3: L—w.removeNext() 3: for i € (1,...,N) do 
4 in= L 4:  markedi] — false 
5: for p€ pred(l) do 5: end for 

6: in — merge(in, out[p][l]) 6: indx — 0 

7: end for 7: DFS(CFG.entry()) 

8 outNew = £(in, 1) 8: return ordered 


9: for s € succ(l) do 


10: if outNew|s] 4 out[l][s] then 

1i: out|l][s] = outNew|s] 1: DFS(l) 

12: if sw.contains(s) then 2: if smark/i] then 

13: w.insert(s) 3: mark[i] — true 

14: end if 4: for s € succ(l) do 

15: end if 5: DFS(s) 

16: end for 6: end for 

17: end while 7: orderedlindz] — l 
8: indz — indx + 1 
9: end if 


4 Implementations of Conditional Analysis 


Static analysis developers commonly solve Eq. 1 using an iterative work-list 
algorithm that propagates the abstract values from the entry nodes l € E, usually 
the single entry node of a program, to the rest of the nodes while computing 
Ain and Aout flow values. The algorithm terminates when for each node in the 
CFG its Ain and Aout are unchanged. 

Algorithm 1 sketches a basic work-list algorithm for a branch-sensitive data- 
flow analysis where for brevity Ain and Aout are denoted as in and out, respec- 
tively. A work-list data structure w keeps track of CFG nodes for which in val- 
ues are changed in the previous iteration and, thus, require recalculation. The 
computation reaches a fixed-point when no changes in in are detected which 
corresponds to w becoming empty. At each iteration a new node l is removed 
from work-list w, its incoming flows are calculated (lines 4 - 7), and its new 
outgoing flow is recalculated using the transfer function f (line 8) for each of its 
successors. That is outNew is an array where each element contains an outgoing 
flow to each of l’s successors. For example, a conditional statement would have 
its first elements associated with the true branch and the second elements asso- 
ciated with the false branch. Lines 9 - 16 determine the changes in the outgoing 
flows for each of l’s successors by comparing the new and old values of out and 
insert the affected successors back to w. 

In order to further improve the efficiency of the work-list algorithm, an anal- 
ysis framework takes into the consideration the ordering of nodes in the CFG. 
It ensures that the nodes in w appearing topologically before a given node are 
processed first. Since, the CFG can be a cyclic graph, the framework populates w 
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Algorithm 3. CFA, implementa- Algorithm 4. CFA, implementa- 


tion of a CFA tion of CFA 
1: £(in, 1) 1: CDFS()) 
2: for s € succ(l) do 2: if smark{i] then 
3: if in= LV-(l,s) € 7 then 3: mark{i] — true 
4 outNew|[s] — L 4: for s € succ(l) do 
5: else 5: if ~(l, s) g m then 
6 outNew|[s] — f(in,l, s) 6: DFS(s) 
7: end if T: end if 
8: end for 8: end for 
9: return outNew 9:  ordered[indz] — l 
10:  indx — indx +1 
11: end if 


using a quasi-topological ordering algorithm similar to one presented in 
Algorithm 2. The node removal and insertion operations on w preserve the CFG’s 
quasi-topological ordering. 

A program analysis framework provides analysis developers with implemen- 
tations of these work-list and ordering algorithms. The developers instantiate 
their analyses by providing implementations for merge and f functions, as well 
as an abstract domain and initial flow values. We present two approaches for 
implementing CSA in such analysis framework. 

The first approach CSA; uses the transfer function f to set the outgoing 
flows to the infeasible branches and its successors to L. Algorithm 3 details that 
approach. Here m is a global variable which in line 3 determines whether the 
outgoing flow for a successor should be set to L, or computed using f(in, l, s) 
of the full SA. Extending an analysis framework to implement CSA in f is 
straightforward and does not require analysis developers to further understand 
the framework’s implementation. However, CSA; does perform extra computa- 
tions along infeasible program paths. 

The second approach C'S Ag addresses this potential performance drawback 
by modifying the quasi-topological DFS search as shown in Algorithm 4. The 
algorithm does not traverse CFG down the paths of the excluded branches 


C1 
eee C2 
C3 ~ C2 
y SÈ E OS 
ba b3 b2 bı 


; | 


Fig. 2. Combining selected conditional statement c2 and CFG (left) to produce an 
abstract graph (right) encoding I = {cif}, {cit, cof}, {c1t, cat} 
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(line 5), thus assigning w only those nodes that are in 7. When a node is 
inserted back to w (Algorithm 1 line 13) only the nodes in 7 are inserted in 
w at their proper positions. C'S Az implementation requires that analysis devel- 
opers an advanced understanding of the analysis framework, i.e., the algorithms 
and data-structures used in the quasi-topological ordering. However, this app- 
roach only iterates over the nodes that are defined in 7. We have implemented 
two approaches and in Sect.6 we empirically compare them. In the next section 
we present our approach on partitioning a CFG into a set of partitions H. 


5 Partitioning CFG 


A program can have many branches and if we decide to use each of them to 
partition CFG then the size of IJ could become prohibitively large, thus we need 
to determine which branches should be used to generate IJ. The goal of our 
selection heuristic is to chose those branches that might reduce the computa- 
tional time. We explore three main characteristics of a conditional statement: 
(a) whether it has non-empty blocks of code bı and b2 on both true and false 
branches respectively, (b) the size of bı and bg in relation to the entire method 
and (c) the difference between the sizes of bı and bo. 

The first heuristic ensures that there is an opportunity for a parallel execution 
of two branches bı and bg. The next two heuristics quantify that opportunity. 
Among bı and bg, we select the one with the maximum block size and calculate 
its ratio to the number of statement in the method. We call this value r;. Then 
we calculate another ratio rg which is the ratio between the difference in block 
sizes to the number of statements in the method. If we use |b;| to denote the size 
of b; block and |m| the number of statements in method m, then 


_ max(|b;|, |b2|) _ abs(|bi| — |b2]) 
t— Sa = 


Im] Im] 


The larger the r; and the smaller the ra, the higher the chances that CSA has 
better performance if those branches are used to partition CFG. After selecting 
a set of branches, we first ensure, for sound CSA analysis, that they do not 
appear inside loops. Next, we combine the selected conditional statements c; 
with structural information about the CFG to generate an efficient set of I. 

For example, consider the CFG on the left of Fig.2 where c; are conditional 
statement and b; are blocks of code. If the heuristic determines that the branches 
of c2 are suitable for the CFG partition then simply expressing the set of par- 
titions J as {{cof}, {cot}} would result in both CSA computing the invariants 
along c;’s false branch, that is performing the computation twice. In order to 
avoid this redundancy our partition algorithm traverses the CFG and finds all 
branches of the conditional statements through which the original conditional 
statements are reachable and store it as an “abstracted” graph similar to one 
shown on the left of Fig. 2. Next, using the abstracted graph we generate IT for 
CSA which in this case are {cif}, {cit, caf}, {c1t, cat}. 
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Such post-processing also handles cases when both c2 and c3 are marked 
for partition. A simplistic approach is to create all possible combinations of 
their branches, but that results in identical partition that compute the same 
invariants, for example, {cst, caf} and {cst,caf} compute the true branch of c3 
both times. In contrast, our partition generation detects that c3 and c4 are 
independent. In our evaluation section we describe the threshold values we used 
for rq and r; parameters. 


6 Evaluation 


We evaluate our implementations of the path-defined conditional analysis using 
two distinct analyses: intra-procedural value-based analysis (VB) and an intra- 
procedural reaching definitions analysis (RD). For VB analysis we used imple- 
mentation and abstract domains that we developed in our previous work [8]. For 
RD we used the implementation provided with Soot framework distribution. RD 
is a relatively fast analysis with an easily computable transfer function, while VA 
takes longer to complete due to its complex transfer function evaluations. For 
each of the analysis we performed experiments with their full versions SA, i.e., 
VB and RD, their CSA, versions implemented with Algorithm 3, which we name 
CVB, and CRDy,, and their CSA. versions implemented with Algorithm 4, which 
we name CVB» and CRD» respectively. The source code, program subjects and 
instructions on replicating the experiment are available on GitHub?. 


Program Subjects. In order to perform our evaluations we first analyzed 105 
methods in 19 Java classes across 10 open-source projects that we used in our 
previous work [8] where we employed Boa [10] to mine methods of open-source 
programs from GitHub, count the number of operations in each method and 
then we randomly selected those methods that contain at least 180 of integer 
operations. Among those 105 methods we selected methods with conditional 
statements that meet the first requirement of our partitioning algorithm to have 
a non-trivial conditional statement where both true and false branches have non- 
empty blocks of code. This step reduced the number of methods to 68. Among 
them 53 methods have at least one non-trivial condition statement outside of 
loops, which allows for computing sound CSA. Those methods have on average 
177 statements and 19 simple conditional statements. 


Abstract Domain Subjects for VB Analysis. VB analysis uses atomic ele- 
ments of its abstract domain to express the computed program invariants. To 
determine whether the size of the disjoint abstract domain influences the effi- 
ciency of VB analysis we used three disjoint abstract domains of small (8 atomic 
elements), medium (10 atomic elements) and large (12 atomic elements) sizes. 
We randomly chose those abstract domains among available disjoint domains 
with the same number of atomic elements. Our preliminary experiments have 
shown that there is no difference in the evaluation data between the domain 
sizes, so we present the data only for the medium size domain. 


? https: //github.com/BoiseState/Conditional-DFA. 


Conditional Data-Flow Analysis 259 


25 


778 7 6 11 9 9 11 9 
01213 222 55 53 4 4 

crm o 00000 0 0000000 0 1000000 
] T T T T Vy T T T T ToT T T T T rT T T T T 1 
00 05 10 15 20 2500 05 10 15 20 2500 05 10 15 20 2500 05 10 15 20 25 


RD/CRD, ratio RD/CRDz2 ratio VB/CVB, ratio VB/CVBz2 ratio 


Fig. 3. Histograms of ratios between runtimes of full and conditional analyses. 


6.1 Experiment Description 


First we analyze 53 methods using full SA, recording its run time and computed 
invariants after each statement. The CSA evaluation consists of three main steps: 
(1) generating a set of partitions JI for each method, (2) running CSA, and 
CSA% analyses on the partitions and recoding run time and invariants, and (3) 
aggregating the computed invariants for partitions of the same method. We run 
experiments on a 2.9 GHz Intel Core i5 processor with 8GB of memory running 
OS X operating system with the analysis running on Java RE 1.8. 


Step 1. We implemented the partition algorithm from Sect.5 in the Soot Java 
Optimization framework to take advantages Soot’s CFG and other related data 
structures. The partition algorithm takes as input a class and its method to be 
partitioned, and parameters rz that determine the minimum value for r;, and rg 
that determines the maximum value for rg. In our evaluations we set rz = 3% 
and ra = 60% for the majority of the methods and increased rz and decreased 
rq values when the number of partitions became greater than 45. This resulted 
in the increase of rz to 15% for two methods and the following (rz, ra) values for 
three methods: (15%, 30%), (20%, 15%) and (20%, 30%). 

This step produced the total of 472 partitions for 53 methods, with the mini- 
mum of two partitions and maximum of 32 partitions per method. A partition 7 
is encoded as a set of branches that CSA should take defined by the conditional 
statement id and the branch’s outcomes: either true of false. As defined in our 
CSA framework, if a conditional statement is not present in 7 then CSA explores 
both of its branches. 


Step 2. We implemented VB, CVB, and CVB» in the Soot Java Optimization 
framework and used Z3 version 4.3.2 as the constraint solver. CVB takes the 
following input parameters: a class name and its method to be analyzed, an 
abstract domain and a partition m. We executed VB; and VBz, for each partition 
r and the full VB analysis. We implement RD, CRD; and CRD» also in the Soot 
framework. CRD takes three input parameters: a class name and its method to 
be analyzed and a partition 7. 

We recorded two sets of data that CSA produces: the running time of the 
analysis and the computed invariants for the corresponding analysis: set of reach- 
ing definition elements for CRD and abstract values for variables expressed as 
SMT constraints for CVB. We execute each experiment three times and use their 
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Table 1. CRD Cost vs. Precision Table 2. CVB Cost vs. Precision 
t, ratio of RD | % sound invariants of RD t, ratio of VA |% sound invariants of VB 
0 | 0-25 | 25-50 | 50-75 | 75—100 | 100 0 | 0—25 | 25-50 | 50-75 | 75—100 | 100 

CRD, analysis CVB, analysis 

<0.2 45| 8 0 0 0 0 <0.2 23 |21 5 3 1 0 
<0.4 43] 6 i. 2 0 L <0.4 15|18 7 5 4 4 
<0.6 32/11 3 1 (0) 6 <0.6 13|11 7 6 3 13 
<0.8 26/11 5 dl 0 10 <0.8 TIT 5 5 8 21 
<1.0 19| 9 7 1 3 14 <1.0 1] 0 2 0 5 45 
CRDg2 analysis CVBg analysis 

<0.2 31/18 1 2 0 1 <0.2 23 | 20 6 3 1 0 
<0.4 21/16 4 4 2 6 <0.4 16|17 7 5 4 4 
<0.6 13] 10 9 3 5 13 <0.6 13/11 7 6 3 13 
<0.8 10| 8 5 2 5 23 <0.8 TIT 5 5 8 21 
<1.0 1| 2 2 2 6 40 <1.0 1] 0 1 ab 3 47 


average to assess CSAs performances. We do not report the time for partition- 
ing since the partitioning is performed once and its running time is negligible 
compared to the analysis time. For the same reason we do not report the time 
for combining the analysis described in the next step. 


Step 3. In the last step we combine invariants of CSA in a way that allows us 
to answer our research questions. First we order the method partitions based on 
their average execution time. Then in order to determine all invariants computed 
at the point when a CSA completes, we combine all invariants from previously 
completed CSA with the current one. The result is aggregated invariants ordered 
based on the execution time of the partitions - from fastest to slowest. To com- 
pare SA and CSA invariants we use the logical equivalence relation for two 
invariants. To compare RD and CRD we compared their sets of reaching defini- 
tion at each program location. To compare VB and CVB we evaluate implication 
relations between their SMT formulas, i.e, (CVB => VB) A (VB => CVB) at 
each program point. If the formula evaluates to true then we count it as a sound 
invariant for CVB. If the formula evaluates to false and the first implication 
evaluates to true, then CSA under-approximates the invariant of SA. All other 
evaluation of the formula to false indicate either a conceptual mistake in our 
CSA approach or a bug in our implementations. In all our experiments, we have 
not observed such cases. 


6.2 Results 


Performance. We used the ratio between runtimes of the slowest CSA partition 
and the full SA for each method to compare CSA and SA performances. Fig. 3 
shows the histograms the ratios for each analysis implementation. The x-axes 
show the ratio values and the labels on top of the bars are the counts for that 
bar interval. 

The histograms show that CRD performed the worst since it has many 
executions with higher runtimes than RD. However, their average runtimes 
across 53 methods are comparable: CRD, is 148 ms and RD is 143 ms. This is 
because CRD, performed much better on larger methods than on smaller ones. 
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Even though CRD2 has 16 method with ratios greater than 1, its average runtime 
is 108 ms, which makes this implementation 24% faster than RD. 

Both CVB; and CVBg have few methods with ratios greater than 1.0, how- 
ever those value are very close to 1.0. Among the 11 CVB, methods that under- 
performed, 6 have ratios of 1.01 and the rest have rations no greater than 1.05. 
For CVBo2’s 8 underperforming methods, 5 of them have the ratios of 1.01, 2 
have the ratios no greater than 1.05 and one has 1.28 ratio. The average run- 
times across 53 methods are 6989 ms for CVB, and 7035 ms for CVBg, which is 
20% faster than VB’s 8689 ms. Even though CVB; and CVBg have comparable 
performances, CVB2 was able to compute more programs faster. 


Invariants. The results for sound invariants computation are presented in 
Tablel for CRD and in Table2 for CVB. The column headers describe the 
two points “0”, “100” and four ranges “(0,25)”, “[25, 50)”, “[50, 75)” and “|75, 
100)” of the percentage of sound invariants of a full SA that CSA is able to 
compute. The row header shows the same ratios of running time of CSA to a 
full SA running time. The cell values represent the count of methods for which 
CSA is able to compute sound invariants within the given invariant range and 
within the given time interval. For example in Table 2 the first data row and the 
second data column contains value 21, which can be interpreted as such: for 21 
methods CVB; is able to produce up to 25% of the sound invariants computed 
by a full VB in 20% of time of the full VB. The data in the second data row and 
in the last column tells us that within 40% of the full VB computational time 
CVB, is able to compute all invariants for 4 methods. 

The data show that CSA can produce sound invariants faster for several 
methods and compute partial sound invariants for a majority of them. For exam- 
ple CVB computes all invariants for 21 methods within 80% of VB runtime and 
can produce partial sound invariants within 20% of VB runtime. Note that the 
histogram counts and the values in the last column might not equal. This is 
because CVB was able to produce the same invariant values as VB after com- 
puting only a few partitions, thus the rest of partitions compute redundant 
information. 

The data shows that the efficiency of the CSA, and CSAs implementations 
depend on the analysis type. Thus, for CRD its CRD2 performs better than 
CRD ,. However, for CVB analysis both implementation produce close results 
with CVB2 performing slightly better than CVB,. CRD is more sensitive to the 
implementation because it is a relatively fast analysis - it runs in a fraction of 
a second while CVB requires several minutes to complete. Overall, the second 
implementation of CSA that require modification of the underlying topological 
order algorithm is a better implementation choice. 


6.3 Discussion 


The results indicate that CSA allows for faster analysis, while requiring minimal 
modification in SA frameworks. However, the main contribution of CSA is its 
ability to provide partial invariants in a fraction of a time of SA. While a user 
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waits for a completion of all partitions to complete she can use the invariants 
provided earlier to check the safety properties of the program. If such property 
does hold, then the user has more confidence about the program correctness. 
However, if the property does not hold for the computed invariants then she 
can start investigate the cause of it. Moreover, the partition information could 
accelerate this task since it narrows down the set of paths that causes property 
violation. 


7 Related Work 


Besides related work on conditional analysis described in the introduction our 
work relates the body of research that improve the performance of SA algorithms 
and the accuracy of SA using program’s structural information. The body of 
work on designing parallel SA algorithms through partitioning the program’s 
state space started back 1990’s with the work of Lee at el., [11] that partitioned 
program CFG into strongly connected components applying fixed point com- 
putation inside those components and then using elimination algorithm [12] to 
combine the data from the external nodes of those components. Albarghouthi 
at el., [13] investigated parallel C interprocedural analysis, where based on the 
reachability in the call-graph multiple method analyzed intraprocedurally in par- 
allel. Dewey at el., [14] explores parallel analysis of JavaScript by partitioning 
the state space of the program into regions that can be computed in parallel and 
those that require synchronizations of the parallel computations, i.e., merging 
points of the analysis. 

Another body of work identifies partitions of CFG to improve the precision 
of the analysis by delaying the merge of abstract values from controls flows or 
adding new abstract elements that exactly describe the join of two abstract 
elements, i.e., computing disjunctive completion of the partially ordered set. 
However, disjunctive completion can lead to excessively large representation of 
abstract values, and at some point, at least some values should be joined in 
order for the computation to reach its fixed point. Prior research has explored 
what abstract values should be joined; computational traces [15] or some other 
heuristic based on the CFG, such as a trace partitioning domain method [16], 
can provide a basis for these determinations. 

Another approach is to delay the join operation by conducting incremental 
analysis as guided analysis [17]. In this approach, each iteration of the fixed point 
computation is applied to an incrementally augmented subgraph of P’s CFG. For 
instance on the first iteration, i.e., propagating abstract values through CFG, the 
analysis considers one true branch of a conditional statement, and on the second 
iteration it would add the false branch. This approach limits the loss of preci- 
sion resulting from widening operators for numerical domains, such as polyhedra 
that have infinite ascending chains. This incremental approach also includes a 
disjunctive extension when the analysis first performs fixed point computation 
before extending the part of the CFG’s to be analyzed, i.e., successively com- 
puting invariants. An orthogonal approach is the path focusing technique [18], 
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which computes invariants separately for each path between two loop-free points 
in the CFG. Thus, each part of the CFG between entrance and exit of a loop 
is expanded into a set of paths. After the computation is done, then results of 
each path are joined. 

The latest development has been in combining guided analysis and path 
focusing techniques [19]. Using this approach, analysis continues to evaluate 
paths between loop-free points encoded separately with the SMT formula. This 
approach allows the analysis to explore only those paths that have the potential 
to improve the precision of the invariants. 

Our approach is complimentary to the above techniques, since a CSA for a 
single partition could use a parallel algorithm for computing its propagation to 
further improve CSA efficiency. 


8 Conclusion and Future Work 


In this work we introduce structurally defined conditional static analysis, formal- 
ize it in terms of standard data-flow frameworks, provide algorithms for CSA, 
and two distinct implementations. We evaluate the efficiency and precision of 
these techniques through extensive empirical study on real-world programs. The 
key insight is that CSA partitions a program’s CFG into a subset of graphs at 
the conditional statements. These partitions induce a series of independent CSA 
executions that can run in parallel. The empirical evaluation suggest that CSA 
provides improvements over the full SA for a significant fraction of a program. 
In particular depending on the analysis around 24% of methods completed their 
analysis within 60% of run time required by the full SA. Moreover, CSA is able 
to produce partial safe invariant computations for a majority of the programs. 

In the future we plan to further improve the efficiency of CSA and the con- 
fidence of the partial information that it produces. Currently CSA that follow 
the same path prefix compute identical information for the prefix, we plan to 
investigate an approach where only one analysis computes the prefix information 
and communicates to the rest of CSA with the common prefixes. In addition, 
we would like to qualify CSA’s partially computed invariants into safe or under- 
approximating based on the partition that CSA analyzes. Thus, when a CSA 
computes an invariant that is marked as safe, the user should use it with the 
same amount of confidence as she would for the full SA. 
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Abstract. We present a new kind of nontermination argument, called 
geometric nontermination argument. The geometric nontermination 
argument is a finite representation of an infinite execution that has the 
form of a sum of several geometric series. For so-called linear lasso pro- 
grams we can decide the existence of a geometric nontermination argu- 
ment using a nonlinear algebraic 4-constraint. We show that a deter- 
ministic conjunctive loop program with nonnegative eigenvalues is non- 
terminating if an only if there exists a geometric nontermination argu- 
ment. Furthermore, we present an evaluation that demonstrates that our 
method is feasible in practice. 


1 Introduction 


The problem whether a program is terminating is undecidable in general. One 
way to approach this problem in practice is to analyze the existence of ter- 
mination arguments and nontermination arguments. The existence of a certain 
termination argument like, e.g, a linear ranking function, is decidable [4,31] and 
implies termination. However, if we cannot find a linear ranking function we 
cannot conclude nontermination. Vice versa, the existence of a certain nonter- 
mination argument like, e.g, a linear recurrence set [20], is decidable and implies 
nontermination however, if we cannot find such a recurrence set we cannot con- 
clude termination. 

In this paper! we present a new kind of termination argument which we call 
geometric nontermination argument (GNTA). Unlike a recurrence set, a geo- 
metric nontermination argument does not only imply nontermination, it also 
explicitly represents an infinite program execution. Hence a user sees immedi- 
ately if the counterexample to termination is a fixpoint or an unbounded diverg- 
ing execution. An infinite program execution that is represented by a geometric 
nontermination argument can be written as a pointwise sum of several geomet- 
ric series. We show that such an infinite execution exists for each deterministic 
conjunctive loop program that is nonterminating and whose transition matrix 
has only nonnegative eigenvalues. 


1 An extended version of this paper [29] contains more examples and further explana- 
tions. 
© The Author(s) 2018 
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p i= i b i= 1; b := 1; 

while (a+b >= 3): while (a+b >= 3): while (a+b >= 4): 
a := 3*a4 1; a:= 3*a — 2; a i= 3xa + b; 
b := nondet (); b := 2«b; b := 2xb; 


(a) (b) (c) 


Fig. 1. Three nonterminating linear lasso programs. Each has an infinite execution 
which is either a geometric series or a pointwise sum of geometric series. The first lasso 
program is nondeterministic because the variable b gets some nondeterministic value 
in each iteration. 


We restrict ourselves to linear lasso programs. A lasso program consists of a 
single while loop that is preceded by straight-line code. The name refers to the 
lasso shaped form of the control flow graph. Usually, linear lasso programs do not 
occur as stand-alone programs. Instead, they are used as a finite representation 
of an infinite path in a control flow graph. For example, in (potentially spurious) 
counterexamples in termination analysis [6,16,21,22,24,25,32,33,37], stability 
analysis [11,34], cost analysis [1,19], or the verification of temporal properties [7, 
13-15, 18] for programs. 

We present a constraint based approach that allow us to check whether a 
linear conjunctive lasso program has a geometric nontermination argument and 
to synthesize one if it exists. 

Our analysis is motived by the probably simplest form of an infinite execu- 
tions, namely infinite execution where the same state is always repeated. We 
call such a state a fixed point. For lasso programs we can reduce the check for 
the existence of a fixed point to a constraint solving problem as follows. Let us 
assume that the stem and the loop of the lasso program are given as a formulas 
over primed and unprimed variables steM(ax, x’) and Loop(z, x’). The infinite 
sequence So, 8,8,5,... is an nonterminating execution of the lasso program iff 
the assignment £o +> so, + § is a satisfying assignment for the constraint 
STEM(a2o, T) \LOOP(#, z). In this paper, we present a constraint that is not only 
satisfiable if the program has a fixed point, it is also satisfiable if the program has 
a nonterminating execution that can be written as a pointwise sum of geometric 
series. 

Let us motivate the representation of infinite executions as sums of geometric 
series in three steps. The program depicted in Fig. la shows a lasso program 
which does not have a fixed point but the following infinite execution. 


EODD) 


We can write this infinite execution as a a geometric series where for t > 1 the 
t-th state is the sum zı + Da A’y, where we have a1 = (?), y = (ğ), and 
A = 3. The state a1 is the state before the loop was executed before the first 
time and intuitively y is the direction in which the execution is moving initially 
and A is the speed at which the execution continues to move in this direction. 
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Next, let us consider the lasso program depicted in Fig. 1b which has the 
following infinite execution. 


(alata ala) oe) 94 Fe yes 


We cannot write this execution as a geometric series as we did above. Intuitively, 
the reason is that the values of both variables are increasing at different speeds 
and hence this execution is not moving in a single direction. However, we can 
write this infinite execution as a sum of geometric series where for t € N\{0} 
the t-th state can be written as a sum 21 + D>), Y (ù £} )'1, where we have 
20 
01 
ones. Intuitively, our execution is moving in two different directions at different 
speeds. The directions are reflected by the column vectors of Y, the values of à 
and A2 reflect the respective speeds. 

Let us next consider the lasso program in Fig. 1c which has the following 


infinite execution. 
COR CDNA A a R cree 


We cannot write this execution as a pointwise sum of geometric series in the form 
that we used above. Intuitively, the problem is that one of the initial directions 
contributes at two different speeds to the overall progress of the execution. How- 
ever, we can write this infinite execution as a pointwise sum of geometric series 
where for t € N\{0} the t-th state can be written as a sum g1 +5 Y( M i yy 
43 

01 
column vector of ones. We call the tuple (£o, £1, Y, A1, A2, 4) which we use as a 
finite representation for the infinite execution a geometric nontermination argu- 
ment. 

In this paper, we formally introduce the notion of a geometric nontermination 
argument for linear lasso programs (Sect. 3) and we prove that each nonterminat- 
ing deterministic conjunctive linear loop program whose transition matrix has 
only nonnegative real eigenvalues has a geometric nontermination argument, i.e., 
each such nonterminating linear loop program has an infinite execution which 
can be written as a sum of geometric series (Sect. 4). 


zı = (2) Y = Ay = 3,A2 = 2 and 1 denotes the column vector of 


where we have xı = (?), Y = | Ay = 3,A2 = 2,4 = 1 and 1 denotes the 


2 Preliminaries 


We denote vectors æ with bold symbols and matrices with uppercase Latin 
letters. Vectors are always understood to be column vectors, 1 denotes a vector 
of ones, 0 denotes a vector of zeros (of the appropriate dimension), and e; denotes 
the i-th unit vector. 


Geometric Nontermination Arguments 269 


2.1 Linear Lasso Programs 


In this work, we consider linear lasso programs, programs that consist of a pro- 
gram step and a single loop. We use binary relations over the program’s states 
to define the stem and the loop transition relation. Variables are assumed to be 
real-valued. 


We denote by x the vector of n variables (£1,..., £n)? € R” corresponding 
to program states, and by a’ = (2x/,...,2/,)7 € R” the variables of the next 
state. 


Definition 1 (Linear Lasso Program). A (conjunctive) linear lasso program 
L = (STEM,LOOP) consists of two binary relations defined by formulas with the 
free variables x and x’ of the form 


A(Z) <6 


for some matrix A € R"*™ and some vector b € R™. 


A linear loop program is a linear lasso program L without stem, i.e., a linear 
lasso program such that the relation STEM is equivalent to true. 


Definition 2 (Deterministic Linear Lasso Program). A linear loop pro- 
gram L is called deterministic iff its loop transition LOOP can be written in the 
following form 


(a2,a2’) E Loop 4> Ga<g A av’ =Mr+m 


for some matrices G € R”*™, M € R"*”, and vectors g E R” and m E€ R”. 


Definition 3 (Nontermination). A linear lasso program L is nonterminating 
iff there is an infinite sequence of states £o, £ı,..., called an infinite execution 
of L, such that (£0, £1) € STEM and (£t, £441) € LOOP for all t > 1. 


2.2 Jordan Normal Form 


Let M € R”*” þe a real square matrix. If there is an invertible square matrix S 
and a diagonal matrix D such that M = SDS™!, then M is called diagonalizable. 
The column vectors of S form the basis over which M has diagonal form. In 
general, real matrices are not diagonalizable. However, every real square matrix 
M with real eigenvalues has a representation which is almost diagonal, called 
Jordan normal form. This is a matrix that is zero except for the eigenvalues on 
the diagonal and one superdiagonal containing ones and zeros. 

Formally, a Jordan normal form is a matrix J = diag(Ji (A1), -< -, Ji, (Ak)) 
where A;,...,A, are the eigenvalues of M and the real square matrices J;(A) € 
R’** are Jordan blocks, 


A10...00 
OAL...00 
Ji(A) = : HS 
000...A1 


000...0A 
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The subspace corresponding to each distinct eigenvalue is called generalized 
eigenspace and their basis vectors generalized eigenvectors. 


Theorem 4 (Jordan Normal Form). For each real square matriz M € R"*” 
with real eigenvalues, there is an invertible real square matrix V € R"*” and a 
Jordan normal form J € R°*” such that M =VJV~!. 


3 Geometric Nontermination Arguments 


Fix a conjunctive linear lasso program L = (STEM, LOOP) and let A € R”*™ and 
b € R” define the loop transition such that 


(x, x2’) E Loop = > A(y) <b. 


Definition 5 (Geometric Nontermination Argument). A tuple (£o, £1, 
Yi,- --,Ys, Àl; ---, Às, H1,- --, Hs—1) is called a geometric nontermination argu- 
ment for the linear lasso program L = (STEM,LOOP) iff all of the following 
statements hold. 
(domain) To, 1, Y1,- -Ys E R”, and Ài, raay Às, H1, <- Hs—1 2 0 
(initiation) (£o, £1) € STEM 
* x 
(point) A (245 y) Sb 


(ray) A(x% ) <0 and A ( ziyr tus iye) <0 for each k € {2...5}. 


The number s > 0 is the size of the geometric nontermination argument. 
The existence of a geometric nontermination argument can be checked using 


an SMT solver. The constraints given by (domain), (init), (point), (ray) are non- 
linear algebraic constraints and the satisfiability of these constraints is decidable. 


Proposition 6 (Soundness). If there is a geometric nontermination argument 
for a linear lasso program L, then L is nonterminating. 


Proof. We define Y := (yı... Yk) as the matrix containing the vectors y; as 
columns, and we define the following matrix. 


Aq Hı 0 0 0 
0 AQ H2. 0 0 
U:= | : a (1) 
000 Àn—1 Hn-—1 
000. O An 


Following Definition 3 we show that the linear lasso program L has the infinite 
execution 


zo, %1, %14+Y1, £xı+Y1+YU1, æzı+Y1+YU1+YU’1l, ... (2) 
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From (init) we get (£0, #1) E STEM. It remains to show that 


t—1 t 
zı + X_YUĴ1, 21+ 5 \YU!1 | € Loor for all t € N. (3) 
j=0 j=0 


According to (domain) the matrix U has only nonnegative entries, so the same 
holds for the matrix Z := ae UJ. Hence Z1 has only nonnegative entries 
and thus Y Z1 can be written as Yaa AkYk for some az, > 0. We multiply the 
inequality number k from (ray) with a, and get 


any 
AN ai hoigl 4 oriri) < 0. (4) 


where we define for convenience yo := 0 and uo := 0. Now we sum (4) for all k 
and add (point) to get 


1+) p WYRE 
A (ean, eo ee <b. (5) 


By definition of œg, we have 


s t—1 
L1 +Y OkYk = gı +YZ1 = ait) YU! 
k=1 j=0 


and 


S S S 
tit > yr +Y (OkAkYk + Akhk-1Yk—1) = £1 +Y1+ Y a, YUey 
k=1 k=1 k=1 


= zı +Y1+YUZ1 


t 
= zı + 5 YUï1. 
j=0 


Therefore (3) and (5) are the same, which concludes this proof. 


Proposition 7 (Closed Form of the Infinite Execution). For t > 2 the 
following is the closed form of the state xe = xı + Da YUĴ1 in the infinite 
execution (2). Let U =: N+D where N is a nilpotent matrix and D is a diagonal 
matriz. 


j j s j—k+1 j k+i—1 

vi1=Y (>: x D ) 15u D es Ie o 
i=0 k=1 i=0 l=k 

4 Completeness Results 

First we show that a linear loop program has a GNTA if it has is a bounded 


infinite execution. In the next section we use this to prove our completeness 
result. 


272 J. Leike and M. Heizmann 


4.1 Bounded Infinite Executions 


Let |:| : R” — R denote some norm. We call an infinite execution (a;):>0 bounded 
iff there is a real number d € R such that the norm of each state is bounded by 
d, i.e., |x| < d for all t (in R” the notion of boundedness is independent of the 
choice of the norm). 


Lemma 8 (Fixed Point). Let L = (true, Loop) be a linear loop program. The 
linear loop program L has a bounded infinite execution if and only if there is a 
fixed point x* € R” such that (a*,x*) € LOOP. 


Proof. If there is a fixed point a*, then the loop has the infinite bounded execu- 
tion x*,x*,.... Conversely, let (x+)+>o be an infinite bounded execution. Bound- 
edness implies that there is an d € R such that |x| < d for all t. Consider the 


k 
sequence 2% := ¢ p] Tt 


ié jJ eH k+1 
|zk — Zk+ı| = pee ea 2 t = WEED wena ee 
1 1 
= K(k — k£k+1| < SS kk +1) Soles + Kens 
< (k-d+k-d) a Oask 
——__(k. : = — —> 3k — œ. 
T k(k +1) k+1 


Hence the sequence (Z%)%>1 is a Cauchy sequence and thus converges to some 
z* € R”. We will show that z* is the desired fixed point. 

For all t, the polyhedron Q := {( 2) | A(Z) < b} contains (271, ) and is 
convex. Therefore for all k > 1, 


Together with 


we infer 


and since Q is topologically closed we have 


(=) = jim (ia) -F €@ 
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Note that Lemma 8 does not transfer to lasso programs: there might only 
be one fixed point and the stem might exclude this point (e.g., a = —0.5 and 
b = 3.5 in example Fig. la). 

Because fixed points give rise to trivial geometric nontermination arguments, 
we can derive a criterion for the existence of geometric nontermination arguments 
from Lemma 8. 


Corollary 9 (Bounded Infinite Executions). If the linear loop program 
L = (true, Loop) has a bounded infinite execution, then it has a geometric non- 
termination argument of size 0. 


Proof. By Lemma 8 there is a fixed point x* such that (a*,a2*) € Loop. We 
choose £o = #1 = &* which satisfies (point) and (ray) and thus is a geometric 
nontermination argument for L. 


Example 10. Note that according to our definition of a linear lasso program, the 
relation LOOP is a topologically closed set. If we allowed the formula defining 
LOOP to also contain strict inequalities, Lemma 8 no longer holds: the following 
program is nonterminating and has a bounded infinite execution, but it does not 
have a fixed point. However, the topological closure of the relation LOOP contains 
the fixed point a = 0. 


while (a > 0): 
a= a / 2; 


Nevertheless, this example has a geometric nontermination argument, namely 
Tı = 1, Yı = —0.5, Ài = 0.5. © 


4.2 Nonnegative Eigenvalues 


This section is dedicated to the proof of the following completeness result for 
deterministic linear loop programs. 


Theorem 11 (Completeness). If a deterministic linear loop program L of 
the form while (Gx < g) do x := Mx +m with n variables is nonterminat- 
ing and M has only nonnegative real eigenvalues, then there is a geometric non- 
termination argument for L of size at most n. 


To prove this completeness theorem, we need to construct a GNTA from a 
given infinite execution. The following lemma shows that we can restrict our 
construction to exclude all linear subspaces that have a bounded execution. 


Lemma 12 (Loop Disassembly). Let L = (true, LOOP) be a linear loop pro- 
gram over R” =U V where U and V are linear subspaces of R”. Suppose L 
is nonterminating and there is an infinite execution that is bounded when pro- 
jected to the subspace U. Let x be the fixed point in U that exists according 
to Lemma 8. Then the linear loop program LY that we get by projecting to the 
subspace V+ a is nonterminating. Moreover, if LY has a GNTA of size s, then 
L has a GNTA of size s. 
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Proof. Without loss of generality, we are in the basis of U and V so that these 
spaces are nicely separated by the use of different variables. Using the infinite 
execution of L that is bounded on U we can do the construction from the proof 
of Lemma 8 to get an infinite execution zo, 21,... that yields the fixed point x“ 
when projected to U. We fix x” in the loop transition by replacing all variables 
from U with the values from æ“ and get the linear loop program LY (this is the 
projection to V + x”). Importantly, the projection of zo, 21,... to V+ a” is still 
an infinite execution, hence the loop LY is nonterminating. Given a GNTA for 
LY we can construct a GNTA for L by adding the vector £“ to £o and 74. 


Proof (of Theorem 11). The polyhedron corresponding to loop transition of the 
deterministic linear loop program L is 


G 0 T g 
M -I ( ) <|—m]. (6) 
aM Fp m 


Define Y to be the convex cone spanned by the rays of the guard polyhedron: 
YX := {y € R” | Gy < 0} 


Let Y be the smallest linear subspace of R” that contains Y, i.e., Y = VY — V 
using pointwise subtraction, and let yt be the linear subspace of R” orthogonal 
to VY; hence R” = VQ y 

Let P := {a € R” | Ga < g} denote the guard polyhedron. Its projection 
PY” to the subspace y is again a polyhedron. By the decomposition theorem 
for polyhedra [36, Corollary 7.1b], py = Q +C for some polytope Q and some 
convex cone C. However, by definition of the subspace F+, the convex cone C 
must be equal to {0}: for any y € C C yt, we have Gy < 0, thus y € Y, and 
therefore y is orthogonal to itself, i.e., y = 0. We conclude that pY“ must be a 


polytope, and thus it is bounded. By assumption L is nonterminating, so LY * is 
nonterminating, and since pY“ is bounded, any infinite execution of LY í must 
be bounded. 

Let U denote the direct sum of the generalized eigenspaces for the eigenvalues 
0 < à< 1. Any infinite execution is necessarily bounded on the subspace U/ since 
on this space the map « œ Ma+m is a contraction. Let U+ denote the subspace 
of R” orthogonal to U. The space Y NU- is a linear subspace of R” and any 
infinite execution in its complement is bounded. Hence we can turn our analysis 
to the subspace Y NUT + æ for some x € yt ®U for the rest of the proof 
according to Lemma 12. From now on, we implicitly assume that we are in this 
space without changing any of the notation. 


Part 1. In this part we show that there is a basis y1,...,ys E€ V such that M 
turns into a matrix U of the form given in (1) with Ay,..., As, H1,---,Ws—1 > 0. 
Since we allow px, to be positive between different eigenvalues (Example 14 
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illustrates why), this is not necessarily a Jordan normal form and the vectors y; 
are not necessarily generalized eigenvectors. 

We choose a basis v1,..., Us such that M is in Jordan normal form with the 
eigenvalues ordered by size such that the largest eigenvalues come first. Define 
Vi:= Y NU and let Vi D ... D V, be a strictly descending chain of linear 
subspaces where V; is spanned by vz,...,Us- 

We define a basis wi1,..., Ws by doing the following for each Jordan block of 
M, starting with k = 1. Let M“) be the projection of M to the linear subspace 
Vy and let À be the largest eigenvalues of M(*). The m-fold iteration of a Jordan 
block Je(A) for m > £ is given by 


ym (JAPI... (m) \mae 


£ 
A™ paes T Am—é+1 
Je(d)” = | | ER, (7) 
0 Am 
Let zo, Z1, Z2,... be an infinite execution of the loop L in the basis Vk,...,Us 


projected to the space V,. Since by Lemma 12 we can assume that there are 
no fixed points on this space, |z:| — co as t — oo in each of the top £ com- 
ponents. Asymptotically, the largest eigenvalue A dominates and in each row 
of J, (Ax)™ (7), the entries (’) A™TÍ in the rightmost column grow the fastest 
with an asymptotic rate of O(m/ exp(m)). Therefore the sign of the component 
corresponding to basis vector vk+e determines whether the top £ entries tend to 
+oo or —oo, but the top £ entries of z corresponding to the top Jordan block 
will all have the same sign eventually. Because no state can violate the guard 
condition we have that the guard cannot constraint the infinite execution in the 
direction of vj or —vj;, i.e., G’*u; < 0 for each j € {k,...,k+ 0} or Gv; > 0 
for each j € {k,...,k +£}, where G”* is the projection of G to the subspace Vp. 
So without loss of generality the former holds (otherwise we use —v, instead of 
vj for j € {k,...,k + 4}) and for j € {k,...,k +} we get vj € ¥+ Vi where 
Vz is the space spanned by v1,...,U%—1- Hence there is a uj € Vi such that 
wj := vj + uj is an element of Y. Now we move on to the subspace Vk+e+1, 
discarding the top Jordan block. 

Let T be the matrix M written in the basis w1,...,w,. Then T is of upper 
triangular form: whenever we apply Mwy, we get A,w, + Uk (wp was an eigen- 
vector in the space Vk) where uk € VE, the space spanned by v1,...,Uk—1 
(which is identical with the space spanned by w1,...,Wp—1). Moreover, since 
we processed every Jordan block entirely, we have that for wz and w; from the 
same generalized eigenspace (Tk k = T} j) that for k > j 


Tj, € {0,1} and Tj, = 1 implies k = j +1. (8) 


In other words, when projected to any generalized eigenspace T consists only of 
Jordan blocks. 
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Now we change basis again in order to get the upper triangular matrix U 
defined in (1) from T. For this we define the vectors 


k 
Yk := Bk > OK, jWj. 


j=1 


with nonnegative real numbers az; > 0, ax, > 0, and B > 0 to be deter- 
mined later. Define the matrices W := (wi...ws), Y := (y1---Ys), and 
a := (Qk,j)1<j<k<s- SO @ is a nonnegative lower triangular matrix with a pos- 
itive diagonal and hence invertible. Since a and W are invertible, the matrix 
Y = diag(@B)aW is invertible as well and thus the vectors yi,...,ys form a 
basis. Moreover, we have yz € Y for each k since a > 0, B > 0, and Y isa 
convex cone. Therefore we get 

GY <0. (9) 


We will first choose a. Define T =: D+ N where D = diag(\j,...,A5) is a 
diagonal matrix and N is nilpotent. Since wy is an eigenvector of M we have 
My = MB 0411 = A18),01,1W1 = A1y1. To get the form in (1), we need for 
alk >1 

Myk = AkYk + Uk-1Yk—1- (10) 


Written in the basis w1,..., Ws (i.e., multiplied with W71), 


(D+ N)Br X. on, 565 = kbr Y ak jej + He-ABR—-1 >, ak-1,jej. 


j<k j<k j<k 


Hence we want to pick a such that 


Yo argal- Anes + NDS arjej — pr-br-1 akie; =0. (11) 


JSk j<k j<k 


First note that these constraints are independent of 6 if we set pz, := By, > 0, 
so we can leave assigning a value to @ to a later part of the proof. 

We distinguish two cases. First, if A-1 A Ax, then Aj — Ax is positive for 
all j < k because larger eigenvalues come first. Since N is nilpotent and upper 
triangular, Ny ek Qk,jej is a linear combination of e1,...,ex—1 (i.e., only the 
first k — 1 entries are nonzero). Whatever values this vector assumes, we can 
increase the parameters a;,; for 7 < k to make (11) larger and increase the 
parameters a,—1,; for j < k to make (11) smaller. 

Second, let £ be minimal such that Ae = Ax wkth £ Æ k, then we,...,w, are 
from the same generalized eigenspace. For the rows 1,...,@— 1 we can proceed 
as we did in the first case and for the rows ¢,...,k — 1 we note that by (8) 
Ne; = Tj_1,;€j;~1- Hence the remaining constraints (11) are 


Y ong Ti—1yej—a — Hea DY asiër = 0, 
l<j<k l<j<k 


which is solved by a%,5417),j41 = k—-1,; for L < j < k. This is only a problem if 
there is a j such that T;_1,; = 0, i.e., if there are multiple Jordan blocks for the 


Geometric Nontermination Arguments 277 


same eigenvalue. In this case, we can reduce the dimension of the generalized 
eigenspace to the dimension of the largest Jordan block by combining all Jordan 
blocks: if My, = Aye + Ye—-1, and My; = Ay; + yj—1, then M(yk + yj) = 
AlYk + yj) + (Ye—-1 + Yj—1) and if Myk = Ayn + ye_1, and My; = Ay;, then 
M (ye +95) = (ye + Yj) + Ye—1- In both cases we can replace the basis vector 
Yk with Yk + yj; without reducing the expressiveness of the GNTA. 

Importantly, there are no cyclic dependencies in the values of a because 
neither one of the coefficients a can be made too large. Therefore we can choose 
a > 0 such that (10) is satisfied for all k > 1 and hence the basis y1,...,Ys 
brings M into the desired form (1). 


Part 2. In this part we construct the geometric nontermination argument and 
check the constraints from Definition 5. Since L has an infinite execution, there 
is a point æ that fulfills the guard, i.e., Ga < g. We choose a1 := x + Yy with 
y > 0 to be determined later. Moreover, we choose Aj,...,As and H1,...,Hs—1 
from the entries of U given in (1). The size of our GNTA is s, the number of 
vectors y1,...,Ys. These vectors form a basis of YNUt+t, which is a subspace of 
R”; thus s < n, as required. 

The constraint (domain) is satisfied by construction and the constraint (init) 
is vacuous since L is a loop program. For (ray) note that from (9) and (10) 
we get 


G 0 0 
M -I (, Me ) < [0 
-M kYk T Hk—-1Yk—1 0 


The remainder of this proof shows that we can choose 8 and y such that (point) 
is satisfied, i.e., that 


Ga, < g and Ma,+m=a2,+Y1. (12) 


The vector x1 satisfies the guard since Gay = Gæ + GY y < g +0 according 
to (9), which yields the first part of (12). For the second part we observe the 
following. 


Mxrı +m= zı +Y1 
=> (M-I)(@+Yy)+m=Y1 
=> (M-I)ex+m=Y1-(M-DYy 


Since Y is a basis, it is invertible, so 


Y-'(M —Doe+Y¥-'m=1-Y-(M -—DY+ 
(U-DYe+Y 'm=1-(U-Dy 
(U-N#z+m=1-(U-Dy (13) 


ttt 
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wih ££ := Ylex = Wta 'diag(6) ts and m := Yim = 
W~'a7‘diag(3)~'m. Equation (13) is now conveniently in the basis y1,..., Ys 
and all that remains to show is that we can choose y > 0 and > 0 such that 
(13) is satisfied. 

We proceed for each (not quite Jordan) block of U separately, i.e., we assume 
that we are looking at the subspace y;,..., Yk with uk = Hj—1 = 0 and pe > 0 for 
all £ € {j,...,k—1}. If this space only contains eigenvalues that are larger than 1, 
then U — I is invertible and has only nonnegative entries. By using large enough 
values for 8, we can make Z and m small enough, such that 1 > (U —I)z+m. 
Then we just need to pick y appropriately. 

If there is at least one eigenvalue 1, then U — I is not invertible, so (13) 
could be overconstraint. Notice that we > 0 for all £ € {j,...,k — 1}, so only 
the bottom entry in the vector Eq. (13) is not covered by y. Moreover, since 
eigenvalues are ordered in decreasing order and all eigenvalues in our current 
subspace are > 1, we conclude that the eigenvalue for the bottom entry is 1. 
(Furthermore, k is the highest index since each eigenvalue occurs only in one 
block). Thus we get the equation My = 1. If Mx is positive, this equation has a 
solution since we can adjust 6, accordingly. If it is zero, then the execution on 
the space spanned by yz is bounded, which we can rule out by Lemma 12. 

It remains to rule out that mz, is negative. Let U be the generalized eigenspace 
to the eigenvector 1 and use Lemma 13 below to conclude that o := N*°~'m+u € 
Y for some u € U+. We have that Mo = M(N*-!m+u) = Mu € U+, so o is 
a candidate to pick for the vector wk. Therefore without loss of generality we 
did so in part 1 of this proof and since yy is in the convex cone spanned by the 
basis w1,...,Ws we get Mz > 0. 


Lemma 13 (Deterministic Loops with Eigenvalue 1). Let M=I+N 
and let N be nilpotent with nilpotence index k (k := min{i | N = O}). If 
GN*-!m £0, then L is terminating. 


Proof. We show termination by providing an k-nested ranking function 
(28, Definition 4.7]. By [28, Lemma 3.3] and [28, Theorem 4.10], this implies 
that L is terminating. 

According to the premise, GN*—!m < 0, hence there is at least one positive 
entry in the vector GN*~!m. Let h be a row vector of G such that h? N'~!m =: 
ô > 0, and let ho € R be the corresponding entry in g. Let x be any state and 
let x’ be a next state after the loop transition, i.e., x” = Ma + m. Define the 
affine-linear functions f;(#) := —h’ N*~Jx + cj for 1 < j < k with constants 
c; € R to be determined later. Since every state x satisfies the guard we have 
h? x < ho, hence fx(x) = —h? x + ch > —ho + ce > 0 for ck := ho + 1. 


fila’) = fila + Nz +m) = -h N"! (x+ Na+m) +c 
= fı(æ)— hT N*¥a — h” N !m 
< fils) —0-6 
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For 1 <j <k, 
file’) = f(@+Nat+m) = -h N" (æ + Ng +m) +c; 
= fj(w) + fj-(£) — h” N*® Im — cj 
< fj(@) + fj- (æ) 


for cj—1 := —h? Nim — 1. 


Example 14 (U is not in Jordan Form). The matrix U defined in (1) and used 
in the completeness proof is generally not the Jordan normal form of the loop’s 
transition matrix M. Consider the following linear loop program. 


while (a—b>0A020): 
a := 3a; 
b := b4+1; 


This program is nonterminating because a grows exponentially and hence faster 
than b. It has the geometric nontermination argument 


zo=(?), xı =F ys = (77) y2 = (Ê), à = 3, Ag = 1, ja = 1. 


The matrix corresponding to the linear loop update is 


w= (oi) 


which is diagonal (hence diagonalizable). Therefore M is already in Jordan nor- 
mal form. The matrix U defined according to (1) is 


31 
e 
The nilpotent component uı = 1 is important and there is no GTNA for this 


loop program where pı = 0 since the eigenspace to the eigenvalue 1 is spanned 
by (0 1)” which is in Y, but not in Y. © 


5 Experiments 


We implemented our method in a tool that is specialized for the analysis of 
lasso programs and called ULTIMATE LASSORANKER?. LASSORANKER is used 
by ULTIMATE BÜCHI AUTOMIZER [22] which analyzes termination of (general) 
C programs. BÜCHI AUTOMIZER iteratively picks lasso shaped paths in the con- 
trol flow graph converts them to lasso programs and lets LASSORANKER analyze 
them. In case LASSORANKER was able to prove nontermination a real counterex- 
ample to termination was found, in case LASSORANKER was able to provide a 


? http://ultimate.informatik.uni-freiburg.de/lasso-ranker/. 
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termination argument (e.g., a linear ranking function), Biichi Automizer con- 
tinues the analysis, but only on lasso shaped paths for which the termination 
arguments obtained in former iterations are not applicable. 

We applied BUCHI AUTOMIZER to the 803 C programs from the Termination 
Competition 2017 Our constraints for the existence of a geometric nontermi- 
nation arguments (GNTA) were stated over the integers and we used the SMT 
solver Z3 [23] with a timeout of 12s to solve these constraints. The overall time- 
out for the termination analysis was 60s. In our implementation, LASSORANKER 
first tries to find a fixpoint for a lasso and only if not fixpoint exists, it tries to 
find a GNTA that can also represent an unbounded execution. The tool was able 
to identify 143 nonterminating programs. For 82 of these a fixpoint was detected. 
For the other 61 programs the counterexample had only an unbounded execution 
but not fixpoint. 

This experiment demonstrates that despite the nonlinear integer constraint 
the synthesis of GNTA is feasible in practice and that furthermore GNTAs which 
can also represent unbounded executions improved BUCHI AUTOMIZER signifi- 
cantly. 


6 Related Work 


One line of related work is focused on decidability questions for deterministic 
lasso programs. Tiwari [38] considered linear loop programs over the reals where 
only strict inequalities are used in the guard and proved that termination is 
decidable. Braverman [5] generalized this result to loop programs that use strict 
and non-strict inequalities in the guard. Furthermore, he proved that termination 
is also decidable for homogeneous deterministic loop programs over the integers. 
Rebiha et al. [35] generalized the result to integer loops where the update matrix 
has only real eigenvalues. Ouaknine et al. [30] generalized the result to integer 
lassos where the update matrix of the loop is diagonalizable. 

Another line of related work is also applicable to nondeterministic programs 
and uses a constraint-based synthesis of recurrence sets. The recurrence sets are 
defined by templates [20,39] or the constraint is given in a second order theory 
for bit vectors [17]. These approaches can be used to find nonterminating lassos 
that do not have a geometric nontermination argument; however, this comes at 
the price that for nondeterministic programs an 4V-constraint has to be solved. 

Furthermore, there is a long line of research [2,3,8-10,12,17,26,27] that 
addresses programs that are more general than lasso programs. 


7 Conclusion 


We presented a new approach to nontermination analysis for (nondeterminis- 
tic) linear lasso programs. This approach is based on geometric nontermination 
arguments, which are an explicit representation of an infinite execution. Unlike, 


3 http://termination-portal.org/wiki/Termination_Competition 2017. 
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e.g., a recurrence set which encodes a set of nonterminating executions, a user 
can immediate see if our nonterminating proof encodes a fixpoint or a diverging 
unbounded execution. Our nontermination arguments can be found by solving 
a set of nonlinear constraints. In Sect.4 we showed that the class of nontermi- 
nating linear lasso programs that have a geometric nontermination argument is 
quite large: it contains at least every deterministic linear loop program whose 
eigenvalues are nonnegative. We expect that this statement can be extended to 
encompass also negative and complex eigenvalues. 
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Abstract. To decide whether a set of states is reachable in a hybrid sys- 
tem, over-approximative symbolic successor computations can be used, 
where the symbolic representation of state sets as well as the successor 
computations have several parameters which determine the efficiency 
and the precision of the computations. Naturally, faster computations 
come with less precision and more spurious counterexamples. To remove 
a spurious counterexample, the only possibility offered by current tools 
is to reduce the error by re-starting the complete search with different 
parameters. In this paper we propose a CEGAR approach that takes as 
input a user-defined ordered list of search configurations, which are used 
to dynamically refine the search tree along potentially spurious coun- 
terexamples. Dedicated datastructures allow to extract as much useful 
information as possible from previous computations in order to reduce 
the refinement overhead. 


1 Introduction 


As the correct behavior of hybrid systems with mixed discrete-continuous behav- 
ior is often safety critical, a lot of effort was put into the development and 
implementation of techniques for their analysis. In this paper we focus on tech- 
niques for proving unreachability of a given set of unsafe states. Besides methods 
based on theorem proving [11,21,25], logical encoding [13,15,22,26] and vali- 
dated simulation [12,28], flowpipe-construction-based methods [2,7,9, 17-20, 27] 
show increasing performance and usability. These methods over-approximate the 
set of states that are reachable in a hybrid system from a given set of initial states 
by executing an iterative forward reachability analysis algorithm. The result is a 
sequence of state sets whose union contains all system paths starting in any ini- 
tial state (usually for bounded time duration and a bounded number of discrete 
steps, unless a fixedpoint could be detected). 

If the resulting over-approximation does not intersect with the unsafe state 
set then the verification task is successfully completed. However, if the intersec- 
tion is not empty, due to the over-approximation the results are not conclusive. 
In this case the only possibility for achieving a conclusive answer is to change 
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some analysis parameters to reduce the approximation error. As a smaller error 
typically comes with a higher computational effort, the choice of suitable param- 
eters by the user can be a tedious task. 

Most tools do not support the dynamic change of those parameters, thus after 
the modification of the parameters the user has to re-start the whole computa- 
tion. One of the few tools implementing some hard-coded dynamic parameter 
adaptations is the STC mode [16] of SpaceEx [17], which dynamically adapts the 
time-step size during reachability analysis to detect the enabledness of discrete 
events more precisely. Another parameter (the degree of Taylor approximations) 
is dynamically adapted in the Flow* tool [9]. The method [5], also implemented in 
SpaceEx, uses cheap (but stronger over-approximating) computations to detect 
potentially unsafe paths and use this information to guide more precise (and 
more time-consuming) computations. In [6] the authors present a method to 
automatically derive template directions when using template polyhedra as a 
state set representation in a CEGAR refinement fashion during analysis. As 
a last example, in [24] the authors use model abstraction to hide model details 
and apply model refinement if potential counterexamples are detected; after each 
refinement, the approach makes use of previous reachability analysis results and 
adapts them for the refined model, instead of a complete restart. 

However, none of the available tools supports the dynamic adjustments of 
several parameters by a more elaborate strategy, which is either defined by the 
user or chosen from a pre-defined set. In this paper we propose such an approach, 
provide an implementation based on the HyPro [27] programming library, present 
some use cases to demonstrate its applicability and advantages, and discuss ideas 
for further extensions and improvements. Our main contributions are: 


— the definition of search strategies to specify the dynamic adjustment of param- 
eter configurations; 

— the formalization of a general reachability analysis algorithm with dynamic 
configuration adjustment following a search strategy, where dynamic means 
that adjustments are triggered during the analysis process in a fully auto- 
mated manner only for parts of the search where they are needed to achieve 
conclusive analysis results; 

— the identification of information, collected during reachability analysis, which 
can be re-used after a parameter adjustment to reduce the computational 
effort of forthcoming analysis steps; 

— a datatype to store information about previously completed analysis steps, 
including information about re-usability, and supporting dynamic parameter 
adjustments according to a given strategy; 

— the implementation of the reachability analysis algorithm using dynamic 
parameter adjustment and supporting information re-usage; 

— the evaluation of our method on some case studies. 


Outline. In Sect. 2 we recall some preliminaries on flowpipe-construction-based 
reachability analysis, before presenting our algorithm for the dynamic adjust- 
ment of parameter configurations in Sect.3. In Sect.4 we provide some experi- 
mental results and conclude the paper in Sect. 5. 
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2 Preliminaries 


In this work we develop a method to dynamically adjust the parameters of 
a verification method for autonomous linear hybrid systems whose continuous 
dynamics can be described by ordinary differential equations (ODEs) of the 
form (t) = A- x(t), but our approach can be naturally extended to methods for 
non-autonomous hybrid systems with external input or non-linear dynamics. 


Hybrid automata [3] are one of the modeling formalisms for hybrid systems. 
Similarly to discrete transition systems, nodes (called locations or control modi) 
model the discrete part of the state space (e.g. the states of a discrete con- 
troller) and transitions between the nodes (called jumps) labeled with guards and 
reset functions model discrete state changes. To model the continuous dynamics 
between discrete state changes, flows in the form of ordinary differential equa- 
tion (ODE) systems, and invariants in the form of predicates over the model 
variables are attached to the locations. The ODEs specify the evolution of the 
continuous quantities over time (called the flowpipe), where the control is forced 
to leave the current location before its invariant gets violated. Initial predicates 
attached to the locations specify the initial states. 

A state o = (£, v) of a hybrid automaton consists of a location / and a variable 
valuation v. A region is a set of states (£, P) = {4} x P. A path m of a hybrid 


automaton is a sequence 7 = dg a O71 s 02 |. of time steps c; ss Oi41 Of 
duration t; and discrete steps ok  ox41 following a jump, where go = (lo, vo) 
is an initial state. A state is called reachable if there exists a path leading to it. 


Flowpipe-construction-based reachability analysis aims at determining the states 
that are reachable in (a model of) a hybrid system, in order to show that cer- 
tain unsafe states cannot be reached. Since the reachability problem for hybrid 
systems is in general undecidable, these methods usually over-approzimate the 
set of states that are reachable along paths with a bounded number of jumps 
(called the jump depth) J and a bounded time duration T (called the time hori- 
zon) between two jumps. We explain the basic ideas needed to understand our 
contributions; for further reading we refer to, e.g., [8,23]. 
Starting from an initial region (Zo, Vo), the analysis over-approximates flowpipes 
and jump successors iteratively. Due to non-determinism, this generates a tree, 
whose nodes n; are either unprocessed leafs storing a tuple (m;; li, Vi; L), or 
processed inner nodes storing (m;; li, Vi; Vi,o,---5 Viki) 

The pair (¢;,V;) is the node’s initial region, which is (lo, Vo) for the root. 
By m; = Lio, €i,0;---, lidi, ĉia; With I; being intervals and e;; being jumps, 


i, Cid; 
we encode a set {oo % oh $ o1... = oy loo € (bo, Vo), ti € Iii} of paths 


along which (¢;, V;) is reachable. 

To process a node (m;; 4i, Vi; L), we divide the time horizon [0, T] into seg- 
ments [ti o, tiji],-- + [tikis tijeiga] With tio = 0 and tiki, = T, and for each seg- 
ment [ti j, ti,;41] we compute an over-approximation V; j of the states reachable 
from V; in 4; within time [ti j, ti j+1]. Le, Ri = UFE Ving contains all valuations 
reachable in location 4; from V; within time T. The segmentation is usually 
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homogeneous, meaning that the time-step size ti j+1— ti j is constant, but there 
are also approaches for dynamic adaptations. 

The processing is completed by computing for each flowpipe segment V; j 
and each jump e from ¢; to some £; an over-approximation V£; of the valuations 
reachable from V; ; by executing e. To store the jump successors, either we add a 
child node (7;, [ti j, ti j+1], 6; 4%, V£; L) to ni; for each V£; # 0, or we aggregate 
successors along a jump e into a single child node (7, [ti j, ti,j’],e; &, R$; L) 
with V£ = for all l ¢ [j, j" — 1] and Ue Ujrety,j-—1j Vér S Re, or we cluster 
successors along a jump into a fixed number of child nodea (see Fig. 3). 

For illustration purposes, above we stored all flowpipe segments V;,; in the 
nodes. In practice they are too numerous and if they contain no unsafe states 
then they are deleted. In the following, we assume that each node stores a tuple 
(z;; Lli, Vi; p), where the flag p is 1 for processed nodes and 0 otherwise. (For 
a simple reachability analysis, we need to store neither the path nor the pro- 
cessed flag, but we will make use of the information stored in them later on. 
Furthermore, we could even delete the initial regions of processed nodes, how- 
ever, besides counterexample and further output generation, they might be also 
useful for fixedpoint detection.) 


State set representations are one of the core components 
in the above analysis procedure. Additionally to the stor- 
age of state sets, these datatypes need to provide certain 
(over-approximative) operations (union, intersection, linear 
transformation, Minkowski sum etc.) on states sets. Besides 
geometric representations (e.g., boxes/hyperrectangles, ori- 
ented rectangular hulls, convex polyhedra, template polyhe- 
dra, orthogonal polyhedra, zonotopes, ellipsoids) also sym- Fig.1. Polytope 
bolic representations (e.g., support functions or Taylor mod- (green) and box 
els) can be used for this purpose. The variety of representa- (hatched) approx. 
tions is rooted in the general problem of deciding between Of State set Vo. 
computational effort and precision. Generally, faster com- (Color figure 
TE si . P online) 

putations often come at the cost of precision loss and vice 

versa, more precise computations need higher computational effort. The rep- 
resentations might differ in their size, i.e., the required memory consumption, 
which has a further influence on the computational costs for operations on these 
representations. 


3 CEGAR-Based Reachability Analysis 


If potential reachability of an unsafe state is detected by over-approximative 
computations, in order to achieve a conclusive verification result, we need to 
reduce the over-approximation error to an extent that allows to determine that 
the counterexample is spurious. 
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(a) Over-approximating the Minkowksi (b) Smaller time steps typically lead to 
sum of a polytope P and a box by aless more precise computations (dark blue) 
complex polytope. P’. than larger time steps (light blue). 


Fig. 2. Reduction and time-step size influence the flowpipe over-approximation error. 
(Color figure online) 


Search parameters, parameter configurations and search strategies. The size of 
the over-approximation error depends on various search parameters, which influ- 
ence besides the precision also the computational effort of the performed analysis: 


1. State set representation: The choice of the state set representation has a very 
strong influence on both the error and the running time of the computations. 
For example, boxes are very efficient but introduce large over-approximations, 
whereas convex polyhedra are in general more precise but computationally 
much more expensive (see Fig. 1). 

2. Reductions: Some of the state set representations can grow in the representa- 
tion size during the computations. For example, during the analysis we need 
to compute the Minkowski sum A $ B = {a+b | a € AA^b E€ B} of two state 
sets A and B. Figure2(a) shows a 2-dimensional example to illustrate how 
the representation size of a polytope P in the vertex representation (stor- 
ing the vertices of the polytope) increases from 4 to 6 when building the 
Minkowski sum with a box. Another source of growing representation sizes 
are large enumerators and/or denominators when using rationals to describe 
for instance coefficients of vectors. When the size of a representation gets too 
large we can try to reduce it on the cost of additional over-approximation. 
Thus the precision/cost is dependent also on the fact whether such reductions 
take place. 

3. Time-step size: The time-step size for the flowpipe construction can be con- 
stant or dynamically adapted. In the constant case it directly determines 
the number of flowpipe segments that need to be over-approximated and for 
which jump successors need to be computed. In the case of dynamic adapta- 
tion, the adaptation heuristics determines the number of segments and thus 
the computational effort. In both cases, smaller time-step sizes often lead to 
more precise computations on the cost of higher computational effort as more 
segments are computed (see Fig. 2(b)). 

4. Aggregation and clustering: The precision is higher if no aggregation takes 
place or if the number of clusters increases (see Fig. 3). However, completely 
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Fig. 3. Six sets (gray), a guard (light green), the aggregation of their intersections (left, 
thick line), and the clustering of their intersections into two sets (right, thick lines); 
both aggregation and clustering introduces additional error (dark green and dark blue). 
(Color figure online) 


switching off both aggregation and clustering often leads to practically 
intractable computational costs. Increasing the precision by allowing a larger 
number of clusters can improve the precision by managable increase in the 
running times, but the number of clusters should be carefully chosen consid- 
ering also the size of the time steps (as they determine the number of flowpipe 
segments and thus the number of state sets to be clustered). 

5. Splitting initial sets: Large initial state sets might be challenging for the 
reachability analysis. If the algorithm cannot find a conclusive answer, we 
can split the initial set into several subsets and apply reachability analysis to 
each of the subsets. Besides the enabling/disabling of initial state set splitting, 
also the splitting heuristics is relevant for the precision. In general, a fewer 
number of initial state sets is less precise but more cheap to compute with. 
Furthermore, it might be also relevant where the splitting takes place. 


Most flowpipe-construction-based tools allow the user to define a search 
parameter configuration, fixing values for the above-listed search parameters. 
Aside from a few exceptions mentioned in the introduction, this configuration 
remains constant during the whole analysis. Whenever an unsafe state is detected 
to be potentially reachable, the user can re-start the analysis with a different 
parameter configuration to reduce the over-approximation error. 

As the executions with different parameter configurations are completely 
independent, potentially useful information from previous search processes gets 
lost. To enable the exploitation of such information, we propose an approach to 
build a connection between executions with different parameter configurations. 

Instead of a single configuration, we propose to define an ordered sequence 
C0,- .-,Cn Of search parameter configurations, which we call a search strategy, 
whereas the position of a parameter configuration within a search strategy is 
called its refinement level. Configurations at higher refinement levels should typ- 
ically lead to more precise computations, but this is not a soundness requirement. 


Dynamic configuration adaptation. We start the analysis with the first config- 
uration in the search strategy, i.e. the one at refinement level 0. If the analysis 
with this configuration can prove safety then the process is completed. 
Otherwise, if the reachability computation detects a (potentially spurious) 
counterexample then the search with the current configuration is paused; note 
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that at this point there might be unprocessed nodes whose successors were not 
yet computed. Now, our goal is to exclude the detected counterexample by doing 
as few computations as possible using configurations at higher refinement levels 
and, if we succeed, process those yet unprocessed nodes further at refinement 
level 0. For the first counterexample this means intuitively re-computing reach- 
ability only along the counterexample path with the configuration at refinement 
level 1; we say that we refine the path. Note that the result of a path refinement 
can be a tree, e.g. if the refinement switched off aggregation. If the counterex- 
ample could be excluded by the path refinement, then we switch back to the 
previous refinement level to process the remaining, yet unprocessed nodes. Oth- 
erwise, if the counterexample could not be excluded then we get another, refined 
counterexample; in this case we recursively try to exclude this counterexample 
by switching to the configuration at the second refinement level etc. 

Let us first clarify what we mean by refining a counterexample path. We define 
a counterexample to be a path in the search tree. If the configuration, which 
created the counterexample, used aggregation then it means determining the 
flowpipes and the jump successors for the given sequence of locations (as stored in 
the nodes on the path) and jumps (as stored on the edges) with the configuration 
at the next-higher refinement level. However, if the previous configuration did 
not aggregate then we need to determine only a subset of the jump successors, 
namely those whose time point is covered by the counterexample. 

Now let us discuss what it means to refine a path by doing as few computa- 
tions as possible. If we find a counterexample at a refinement level 7 then we need 
a refinement for the whole path at level i+ 1. However, another counterexample 
detected previously at level 2 might share a prefix with the current one; if the 
previous counterexample has already been refined then we need to refine only 
the not-yet-refined postfix of the current counterexample. 

The analysis at refinement level 0 and each path refinement computation 
generates a search tree. To reduce the computational effort as much as possible, 
we have to exchange information between these search trees. For example, for 
a given counterexample found at refinement level ¿ we need to know whether a 
prefix of it was already refined at level 1+1. To allow such information exchange, 
we could store each search tree separately and extract information from the trees 
when needed by traversing them. This option requires the least management 
overhead during reachability computations but it has major drawbacks from the 
point of computational costs for tree traversal. Alternatively, we could store each 
search tree separately but store in addition refinement relations between their 
nodes, allowing to relate paths and retrieve information more easily. However, 
we would have high costs for setting up and storing all node relations. Instead, 
we decided to collect all information in a single refinement tree. Tree updates 
require a careful management of the refinement nodes and their successors, but 
the advantage is that information about previous searches is easier accessible. 

Next we first discuss how nodes of the refinement tree are processed, how 
paths in the refinement tree are refined, and finally we explain our dynamical 
parameter refinement algorithm. 
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The algorithm. Each refinement tree node n; is a kind of “meta-node” that 
contains an ordered sequence (n?,...,n¥*) with 0 < u; < n, where n +1 is 
the size of the search strategy, and each entry n? has the form (m; £, V; p) as 
explained in Sect. 2. 

Assume for simplicity that the model has a single initial region (0, Xo), and 
let Vo, represent Xo according to the state set representation of refinement level 
i. The refinement tree is initialized with a root node ng = (n8,... n3) with 
ni = (e; £0, Vos; 0). 

We additionally introduce a task list which is initialized to contain (1); 0; €) 
only. Elements (n;;7;7) in the task list store the fact that we need to compute 
successors for the jth element of the refinement node n, at level j. If 7 = e then 
we are not refining and we need to consider all the successors for further com- 
putations, otherwise we are at a refinement level 7 > 0 and only the successors 
along the counterexample-path m need to be considered. 

We remove and process elements from the task list one by one. Assume we 
consider the task list element (n,; j; 7’) with n? = (a; £, V; p). 

If p = 0 then we over-approximate the flowpipe starting from V in £ for the 
time horizon T, using the configuration at level 7 in the search strategy. 

If the computed flowpipe segments contain no bad states and the jump depth 
J is not yet reached then we compute also the jump successors. Depending on 
the clustering/aggregation settings at level j, this yields a set of jump successor 
regions Ry,..., Rm with Ry = (lk, Vg) over time intervals J4, ... , Im along jumps 
€1,--+,@m- If the number of children m’ of n; is less than m then we add m — m’ 
new children; if m’ > 0 then we add to the newly created children as many 
dummy entries (containing empty sets) as the other children have, in order to 
bring all children to the same refinement level. After that, we select for each 
k =1,...,m a different child ù% of n; and append (a, Ip, €k; lk, Vk; 0) to the 
child’s entry sequence (see Fig. 4). If m’ > m then we add to all not selected 
children (to which no new entry was added) a dummy entry. Finally, we set p 
to 1. 

If the node could be processed without discovering any bad states (or if p 
was already 1 and thus processing was not needed) then we update the task list 
as follows: 


— If x’ = e then we have to process all successor nodes at the level j’ determined 
by the number of entries E in each of the nodes fiz. We add (fx; E; €) to the 
task list for all k = 1,...,m. 

— Otherwise, if 7’ = I,e,n” then we add (fix; j; n”) for all k = 1,...,m for 
which I, N I Æ Q and ex, =e. 


Note that if 7’ = e but j > 0 then we just succeeded to refine a spurious 
counterexample from level 7 — 1 to a safe path at level j and can continue 
further successor computations using a lower level configuration. This switch to 
a lower level happens because the children A, of n; have less then j entries in 
their queues. Now the processing is completed and the next element from the 
task list can be approached. 
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(a) Node refinement adds child nodes. (b) Node refinement removes child nodes. 


Fig. 4. Tree update after node refinement with changing number of child nodes and 
transition timing refinement. 
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(a) Safe path. (b) Counterexample. (c) Refinement. 


Fig. 5. Partial tree refinement to remove a spurious counterexample. 


If during processing (n;; j; m') with n? = (7; £, V; p) the computed flowpipe 
had a non-empty intersection with the set of unsafe states then we have found 
a counterexample at level j. If j = n then the highest refinement level has 
been reached and the algorithmus terminates without any conclusive answer. 
Otherwise, if j < n, we repeat the computations along the counterexample path 
with a higher-level configuration (see Fig. 5). This is implemented by adding 
(no; j + 1; m, T) to the task list. 

The main structure of the algorithm is shown in Algorithm 1.1. 


3.1 Incrementality 


The efficiency of the presented approach can be further improved by implement- 
ing incrementality: already available book-keeping and additional information 
gained throughout the computation can be exploited to speed up later refine- 
ments. 

For example, the presented approach already keeps track of time intervals 
where jumps were enabled, i.e. the time intervals during which the intersection of 
a state set and the guard condition was non-empty. Assume we process (n; i; 7’) 
at level i with n; = (m; l, V; p) being the ith entry in n. Let J be the union 
of all the time intervals for all flowpipe segments for which a non-empty jump 
successor was computed along a jump e. Later, when processing (ñ; j; 7’) at level 
j >iwith ñj = (ĉ; l, V; P) being the jth entry in A, if the path set encoded 
by 7 is included in the path set encoded by m then we need to compute jump 
successors along e only for flowpipe segments over time intervals that have a 
non-empty intersection with J. 
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1 analyze(){ 

2 while (true) do 

3 if (task list is empty) then 
4 return safe 


5 fi; 

6 take an element (n,;j;7’) with ni = (x; £,V; p) from task list; 
7 if (p=0) then 

8 R := computeFlowpipeSegments (£, V,j) 

9 fi; 

10 if (p=0 and R contains unsafe states) then 

11 if (j=n) then return unknown; 

12 addToTaskList ( (no; j + 1;7,7')) 

13 else 

14 if (jump depth not yet reached) then 

15 computeJumpSuccessorsAndUpdateTaskList (n;, j, 7’, R) 
16 fi 

17 fi 

18 od 

19 } 


Algorithm 1.1. Reachability analysis algorithm with backtracking and refinement. 


Table 1. Strategies s; with different refinement levels (lvl.). Strategies vary time step 
size (ô) and state set representation (box, sf = support function). Strategy ss changes 
aggregation and clustering (n = no aggregation, c:max. number of successor nodes). 


Strategies 


so Sı S2 83 S4 s5 

lvl. O J1 2 Oo jl 2 |3 o |1 2 (3 0 |1 o |1 0 [1 2 
ô .01 |.001/.0001/.01 |.001|.01 .001/.01 |.001/.01/.0001/.1 |.001).1 |.001).1 /.001|.001 
rep. | box} box sf box|box |sf isf |box|box sf sf box|sf |box|poly |box|box |sf 


Similarly, if (£, V) contains no unsafe states but (£, V) does then we know 
that the latter counterexample is spurious if the path set encoded by 7 is included 
in the path set encoded by a. 

A similar observation holds for flowpipe segments: if a segment in the flowpipe 
of (£, V) is empty, what happens if the invariant is violated, then we know that 
the same segment of the flowpipe from (£, V) will also be empty. 


4 Experimental Results 


In order to show the general applicability of our approach we have conducted 
several experiments on an implementation of the method presented in Sect. 3. We 
have used our implementation to verify safety of several well-known benchmarks 
using different strategies (see Table 1). All experiments were carried on an Intel 
Core i7 (4 x 4 GHz) CPU with 16 GB RAM. Results for the used strategies can 
be found in Table 2. 
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Benchmarks. Different benchmarks from the area of hybrid systems verification 
are selected: The well-known bouncing ball benchmark models the height and 
velocity of a falling ball bouncing off the ground. The added set of bad states 
constrains the height of the ball after the first of 4 bounces. This benchmark 
already exhibits most properties more challenging benchmarks cover while being 
simple enough to be a sanity check for our method. 

The 5-D switching system [10] is an artificially created model with 5 locations 
and 5 variables which shows more complex dynamic and is well-suited to show 
the differences in over-approximation error between the used state set represen- 
tations. We added a set of bad states in the last location where the system’s 
trajectories converge to a certain point. 

The navigation benchmark [14] models the velocity and position of a point 
mass moving through cells on a two-dimensional plane (we used variations of 
instances 9 and 11). Each cell (location) exhibits different dynamic influencing 
the acceleration of the mass. The goal is to show that a set of good states 
can potentially be reached while a set of bad states will always be avoided 
(see Fig. 6(b)). The initial position of the mass is chosen from a set, such that 
this benchmark demonstrates non-determinism for the discrete transitions which 
results in a more complex search tree. 

The platoon benchmark [1,4] models a vehicle platoon of three cars where 
two controlled cars follow the first one while keeping the distance e; between each 
other within a certain threshold (see Fig. 6(a)). This benchmark was chosen, as 
it unifies a higher dimension of the state space with a more complex dynamic. 


Strategies. During the development of our approach we tested several strate- 
gies with varying parameters (a) the state set representation, (b) the time step 
size and (c) aggregation settings. In general, other parameters (e.g. initial set 
splitting) could be also considered but our prototype currently does not yet 
support these. For this evaluation we selected six strategies so,...,85 which 
mostly vary (a) and (b) (see Table 1). Changing aggregation settings has shown 
to be challenging for the tree update mechanism but the exponential blow-up 
of the number of tree nodes did not render this method effective in practice. 
Furthermore for disabled aggregation settings, the largest precision gain can be 
observed for boxes while for all other tested state set representations the effect 
can be neglected. Note that our prototype implements the general case where 
time step sizes are not necessarily monotonically decreasing and multiples of 
each other which implies refinement starting from the root node. 


Comparison. We compare our refinement algorithm (1) with a classic approach 
where no refinement is performed. To achieve this, we specify only a single strat- 
egy element for our algorithm. We give results for (2) the fasted successful setting 
(of the respective strategy), an experienced user would choose and for (3) the 
setting with the highest precision level, a conservative user would select. The 
three entries per cell in Table 2 show the running times for our dynamical app- 
roach (gray), the fastest successful setting and the conservative approach. The 
numbers in brackets show the number of nodes in the search tree; for refinement 
strategies we give the number of nodes for each refinement level. 
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Table 2. Experimental results in seconds for different strategies. Timeout (TO) was 
set to 10min, memout (MO) to 4GB, (err) marks numerical errors. Three results per 
cell: (1) dynamic refinement (gray), (2) fastest successful setting only, (3) most precise 
setting. In brackets: Number of nodes in the search tree, refinement runs give the 
number of nodes on each level. 


Bm Strategy 
So S1 S2 83 84 85 
BBI [0.15 (5/2/0)] 0.15 (5/2/0/0) 0.15 (5/2[0/0) 0.46 (5/2) [1.58 (5|2)| 0.21 (29]4]0) 
0.22 (5) 0.18 (5) 0.18 (5) 0.97 (5) 3.45 (5) 1.71 (121) 
11.93 (5) 0.97 (5) 9.90 (5) 0.97 (5) 3.45 (5) 9.47 (63) 
Na09 RO 5.76 (279|6|4|0)|5.09 (317/17|6|0) 549 err RO 
TO 118 (244) 118 (244) TO TO MO 
TO TO TO TO TO TO 
Nall TO 7.15 (45]8|7|0) | 7.61 (75|16|7|0) 163.4 (73|11) err 120 (75|4168]|0) 
TO 6.4 (24) 6.4 (24) 395 (24) TO 130 (4170) 
TO 395 (24) TO 395 (24) TO TO 
5DS. | 2.27 (6515) | 0-49(5|5]5|5) 2.3 GBB) | 0.39 (515) |15.31(5|5)| 0.45(5|64)5) 
2.35 (5) 0.38 (5) 2.36 (5) 0.38 (5) TO (5) 0.37 (5) 
2.35 (5) 0.38 (5) 2.36 (5) 0.38 (5) TO (5) 0.37 (5) 
Plat. | 173 (5]4|4) | 3.67 (5/4/4]0) 3.6 (5]4/4[5) 18.7 (5/4) EO 19.16 (5]4]4) 
TO 3.48 (5) 3.48 (5) 18.9 (5) TO 18.8 (5) 
TO 18.9 (5) TO 18.9 (5) TO 18.8 (5) 


Observations. The results in Table 2 show that our method in general is com- 
petitive to classical approaches, as the running times are in the same orders of 
magnitude as the fastest setting when using dynamic refinement and in some 
cases our method is even faster. From the results we can infer manifold: 


— Our implementation currently supports re-using information of guard inter- 


section timings (see Sect. 3.1) while other information such as time intervals 
where a state set is fully contained in the set defined by the invariant of 
a location are not used. Keeping track of this reduced information already 
noticeably influences the running times as costly intersection operations for 
transition guards can be avoided for most computed segments and the running 
times can compete with the optimal setting. This shows that the additional 
cost of pre-computing parts of the search tree can be compensated in terms 
of running time when information is properly re-used. 

The length of the counterexample plays a significant role — in the bouncing 
ball benchmark the set of bad states is reachable after one discrete transition 
and from then on never again while in the 5-D switching system the set of 
bad states is reached in the last reachable location which causes a refinement 
of the whole tree and a recovery to a lower refinement level is not possible. 
In the platoon benchmark, stepping back to a lower refinement level does not 
provide any advantages, as an intersection with the set of bad states occurs 
before transition timings can be recorded (see Fig. 6(a)). To overcome this 
problem a future implementation should allow for additional entry points for 
refinement in order to reduce the length of the refinement path (see Sect. 5). 
The shape of the search tree influences the effectiveness of our approach. 
As the navigation benchmark is the only benchmark in our set where the 
resulting search tree naturally branches due to multiple outgoing transitions 
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(a) Platoon benchmark (variables t and (b) Navigation benchmark (instance 9) 
e1) for strategy s3. Refinements (blue) with strategy sı. The set of bad states 
increase in saturation. Discrete jumps (left box, red); the set of good states 
occur at multiples of 5t, Bad states are (bottom box, green); Refinements of 
€1 < 42 (bottom, red). strategy sı (blue, orange). 


Fig. 6. Result plots for the platoon and the navigation benchmarks with refinement. 
(Color figure online) 


per location, the effect of partial refinement can especially be observed for 
this benchmark. Whole subtrees can be cut off and are shown to be unreach- 
able on higher refinement levels such that the number of nodes is reduced. 
The presented method renders most effectively for systems exhibiting non- 
determinism, which is reflected in a strongly branching search tree. 

— Coarse analysis allows for fast discovery of the search tree, possibly requir- 
ing more nodes to be computed. We can observe that for models with non- 
determinism the number of nodes at the highest required level is lower than 
when using the classical approach. Together with the running times this con- 
firms our assumption that putting effort in selective, partial refinement of 
single branches pays off in terms of computational effort. 


In conclusion we expect a strategy where a coarse analysis precedes a fine- 
grained setting (e.g. strategy s3) which allows to detect enabled transitions 
quickly and to recover fast after the removal of a spurious counterexample shows 
good results on average. 


5 Conclusion 


We presented a reachability analysis algorithm with dynamic configuration 
adjustment, which allows to refine search configurations to obtain conclusive 
results, but exploits as much information as possible from previous computa- 
tions in order to keep the computational effort as low as possible. We plan to 
continue our work in several directions: 
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Incrementality. Our current implementation re-uses information from previous 
refinement levels about the time intervals of jump enabledness. We will imple- 
ment also the re-usage of information when an invariant is definitely true or 
definitely violated (when the flowpipe segment for a time interval was fully con- 
tained or fully outside the invariant set). 

Additional parameters. The current implementation supports 3 parameters in 
search strategies: time-step size, state set representation, aggregation and clus- 
tering settings. We aim at extend our search strategies with the adjustment of 
further parameters. 

Dynamic strategy synthesis. Using information about a counterexample, e.g. the 
Hausdorff distance between the set of bad states and the state set intersecting 
it, automatically deriving strategies for partial path refinement could be further 
investigated. 

Parameter synthesis. With little modification we can use our approach also to 
synthesize the coarsest parameter setting which still allows to verify safety. This 
can be achieved by strategies, where the parameter settings decrease in precision 
and the analysis stops when a bad state is potentially reachable. 

Partial path refinement. Partial refinement of counterexamples, for example 
restricted to a suffix, could possibly improve the effectiveness of the approach (if 
the refinement of the suffix renders a bad state unreachable). 

Conditional strategies. We defined search strategies to be ordered sequences of 
parameter configurations, which are used one after the other for refinements. 
Introducing trees of configurations with conditional branching would allow even 
more powerful strategies where the characteristics of the system or runtime infor- 
mation (like previous refinement times, state set sizes, number of sets aggregated 
etc.) can be used to determine which branch to take for the next refinement. 
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Abstract. We introduce in this paper AMT 2.0, a tool for qualita- 
tive and quantitative analysis of hybrid continuous and Boolean signals 
that combine numerical values and discrete events. The evaluation of 
the signals is based on rich temporal specifications expressed in extended 
Signal Temporal Logic (xSTL), which integrates Timed Regular Expres- 
sions (TRE) within Signal Temporal Logic (STL). The tool features 
qualitative monitoring (property satisfaction checking), trace diagnostics 
for explaining and justifying property violations and specification-driven 
measurement of quantitative features of the signal. 


1 Introduction 


Cyber-physical systems, such as automotive embedded controllers, medical 
devices or autonomous vehicles, are often modeled and analyzed by simulation. 
Simulators generate traces admitting real values often interpreted as continuous- 
time signals. To evaluate the system under design, these traces are inspected for 
satisfying some correctness requirements and are often subject to quantitative 
analysis based on recording some values in certain segments of the signal and 
performing some computation (summation, minimum) on them. 

Over the past decade an extensive framework has been developed whose goal 
was to bring automated support for this tedious and error-prone task, centered 
around Signal Temporal Logic (STL) [18,19]. STL extends the classical LTL in 
two directions: it uses predicates over real-valued variables in addition to atomic 
propositions, and it is defined over dense continuous time accessed symbolically 
with timed modalities as in Metric Temporal Logic (MTL) [17]. This framework, 
which was initially accompanied by a rudimentary prototype tool [20], had a lot 
of reported applications in domains such as automotive, robotics, analog circuits, 
systems biology. It can be viewed as an extension of runtime verification toward 
cyber-physical hybrid systems. Interested readers may consult the survey in [7]. 


© The Author(s) 2018 
D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 303-319, 2018. 
https: //doi.org/10.1007/978-3-319-89963-3_18 
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In this article we present AMT 2.0, a new version of the tool. The new 
version is much more mature in terms of software engineering aspects such as 
rigorous typing of signals and properties, introducing programming language 
features that include declarations and aliases, improvement of the graphical 
editors, systematic software testing, etc. Furthermore, its functionality has been 
extended significantly by incorporating several new research results obtained 
over the last years: 


1. We combine STL with a fragment of Timed Regular Expressions (TRE) [4,5], 
as a complementary formalism to express temporal patterns. The monitoring 
algorithm for our specification language xSTL thus obtained integrates the 
recent TRE pattern matching algorithm reported in [22]. 

2. We use the TRE formalism to define segments of the signal to which quan- 
titative measurements should be applied. Thus we obtain a declarative mea- 
surement language that does for the quantitative domain what formal spec- 
ification languages do for correctness checking. The results, first reported in 
[14], are fully incorporated into the tool. 

3. We implement the error diagnostics algorithm of [13] which accompanies the 
report on a property violation with a justification: a small sub-signal (tem- 
poral implicant) which is sufficient to imply the property violation and to 
convince the user of this fact. 


With all these features we progress in easing the task of designers who seek 
to analyze a complex system based on simulations, providing them with an 
alternative to manual inspection or explicit programming of observers. 

The rest of the paper is organized as follows. In Sect. 2 we present the xSTL 
specification language. Section 3 gives an overview of the tool and its main fea- 
tures. We illustrate the usage of AMT 2.0 in Sect.4 with two examples. We 
present the related work in Sect. 5 and give concluding remarks in Sect. 6. 


2 Extended Signal Temporal Logic 


Extended Signal Temporal Logix (xSTL) essentially combines STL with a vari- 
ant of TRE. In this section, we provide the mathematical definitions of the 
specification language. 

We denote by P and X finite sets of propositional and data variables, such 
that |P| = m and |X| = n. Data variables are defined over an arbitrary domain 
D, typically the reals or the integers. We use the notation w : T — D” x B” to 
represent a multi-dimensional signal with T = [0,d) C R and B = {true, false}. 
We denote by wp the projection of w on its component p. We denote by 0 : D” — 
a predicate that maps valuations of variables in X into {true, false}. 

The syntax of an STL formula y with both future and past temporal opera- 
tors and interpreted over X U P is defined by the grammar 


p :=p | 0(21,...,2n) | =Y | p1 V p2 | p1U rpo | p1 Sipe 
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where p € P, z1,...,£n E€ X and I C R* is an interval. We denote by U the 
until operator that is decorated with an unbounded interval U (0,%). We use 
the strict semantics [2] for until and since temporal operators that allows us 
to define (continuous-time) next Qy = pUy and (continuous-time) previous 
Ov = ¢Sy. The instantaneous rise and fall events can be derived using the 
rules ty = O-~ACO¢ and ly = © A On. We derive other standard 
operators as follows: true = p V 7p, false = atrue, yi A yo = 7(7H1 V 7%2), 
Pı > P2 = 791 V p2, Ory = true Ur p, Ory = true Sr p, Ory = Oy, 
and Ery = 7 Or-y. 

The semantics of an STL formula with respect to a signal w is described via 
the satisfiability relation (w,t)  y, indicating that the signal w satisfies ọ at 
time point t, according to the following definition. 


(w,t) Ep <> wp|t] = true 

(w,t) = O(a1,...,2n)  O(we, [t],..., Wa, [t]) = true 

(w,t) = 7p > (wt) Fy 

(w,t) E y1 V p2 > (w,t) E pı or (w,t) E p2 

(w,t) = p1U rp2 edt e(t+Dn t) H we and 


(w,t) = yi Sry oe dt € (t¢-IT AT: (w,t’) H v2 and 


t 
: ( 
Vt < t” <t (w t") E yy 
( 
Wer <t (w,t") 


We now define a variant of TRE according to the following grammar: 
r:=e|p|O(a1,...,@n) | r1- r2 | r1 Ura | r1 Nra | r* | (r)r | ri? re | re!re 


where J is an interval of R+. The semantics of a timed regular expression r with 
respect to a signal w and times t < t’ in [0, d] is given in terms of a match relation 
(w, t,t’) = r, which indicates that the segment of w between t and t matches 
the expression. This relation is defined inductively as follows: 


(w,t,t) Ee et=ť 

(w,t,t) Ep = t< t and Vt" € (t,t), wp[t] = true 

(w, t,t’) = O(a1,...,an) t <t and Vt" € (t,t), O(wa, [t”],..., We, [t"]) = true 
(w,t,t’) Eri-re = Jt” t< t” <t, (w,t,t”)= rı and (w, t,t’) E re 
(w,t, t) EriUr2 > (w,t,t') E rı or (w,t, t) = re 

(w,t, t) Erinre > (w,t,t') E rı and (w,t,t') = re 

(w,t, t) =E r* = Ik > 0, (w,t, t) E r? 

(w,t, t) = (r)r = (w,t,t)=randt -tel 

(w,t, t) Eri?re | (w,t,t') E re and at” < t, (w, t”, t) =E rı 

(w,t,t’) Eri!re = (w,t, t) E rı and at” >t, (w,t', t”) E r2 


The last two operations associate a pre-condition (resp. post-condition) to 
the expression. We note that with the pre- and post-condition, we can also 
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syntactically define rise and fall operators by using the rules | p = ~p? €! p and 
lp = p?e!-p. Extended STL specifications require regular expressions to be 
embedded into STL formulas. We define two operators, begin match (@(r)) and 
end match ((r)@) that intuitively project any signal segment (t, t’) that matches 
the expression r to its beginning t and its end t’, respectively. Thus, xSTL simply 
extends STL with these two operators: 


p := p | O(21,..-, En) | =Y | p1 V G2 | p1U rp | v1 S142 | @(r) | (r)@ 
and with the following semantics 


(w,t) = @(r) > 
(w,t) = (r)@ = 


t >t (w,t,t’) 
t <t (w,t,t) 


3 Tool Presentation 


The AMT 2.0 tool provides for qualitative and quantitative analysis of simula- 
tion/measurement traces. Its input consists of two major ingredients. The first 
is typically a formula or a collection of formulas in xSTL specifying the desired 
properties (and later measurements) of a continuous signal. The second is a finite 
representation of the continuous signal. Input signals obtained from simulators or 
measurement devices are given as finite sequences of time-stamped values of the 
form (ti, w[ti]). The tool supports two commonly-used formats: Value Change 
Dump (vcd) and Comma Separated Values (csv) files. To obtain continuous-time 
signals, values between sampling points are interpolated inside the tool to yield 
either piecewise-constant or piecewise-linear signals. 

The tool can work either interactively via its graphical user interface (GUI) 
or, alternatively, in batch mode when we want to monitor against many sig- 
nals or incorporate monitoring in a more sophisticated analysis procedure that 
may iterate over behavior-generating models and/or properties in an outer loop. 
Figure 1 shows the main evaluation window of the GUI which provides two main 
functionalities: (1) editing xSTL specifications; and (2) launching the monitor- 
ing procedure by selecting properties and signals and presenting the outcome 
graphically. The AMT 2.0 tool is entirely implemented in Java to facilitate its 
usage across different platforms and operating systems. 

The tool supports three main functionalities: (1) qualitative offline monitor- 
ing of extended STL specifications; (2) localization and explanation of property 
violations; and (3) measurements of quantitative features of signals driven by 
temporal pattern expressed using TRE. In the remainder of the section we 
present these functionalities in more detail. 
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Fig. 1. AMT 2.0 - an overview of the graphical user interface. 


3.1 Specifications in AMT 2.0 


The tool facilitates specification of xSTL properties in several ways. The GUI 
provides an xSTL editor, depicted in Fig. 2, with syntax highlighting and line 
numbering. In addition, the xSTL parser implements a number of features bor- 
rowed from programming languages. This includes (1) declaration of variables 
and constants, (2) parameterized property templates, (3) support for Boolean, 
real and integer variables and (4) type checking with extensive error reporting. 


3.2 Qualitative Monitoring of xSTL 


In this section, we sketch the algorithm for the major functionality of the tool, 
qualitative monitoring of xSTL specifications. The procedure is based on two 
main methods that we describe in the sequel: the offline marking procedure for 
STL [19] and the pattern matching procedure for TRE [22]. 
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bool trigger; = 
real vara; 
real varb; 

4 real varc; 
real vard; 

6 real vare; 


const real vh = 5.0; 
3 const real vl = 8.2; 


11 assertion one: 


always((vara <= vh) and (rise(trigger) -> (eventually[@:600] always[0:300] vara <= vl))); 
14 assertion two: 

always ((varb <= vh) and (rise(trigger) -> (eventually[@:600] always[0:300] varb <= v1))); 

assertion three: 
always ((varc <= vh) and (rise(trigger) -> (eventually[0:600] always[0:300] varc <= v1))); 

20 assertion four: 
always ((vard <= vh) and (rise(trigger) -> (eventually[@:600] always[0:300] vard <= v1))); 

|23 assertion five: 
always ((vare <= vh) and (rise(trigger) -> (eventually[0:600] always[0:300] vare <= v1))); 


Fig. 2. AMT 2.0 - xSTL editor. 


The qualitative monitoring procedure for STL is an offline method that works 
directly on the input signals. The procedure is recursive on the structure of 
the specification — it propagates the truth values from input signals via sub- 
formulas up to the main formula. The algorithm uses the notion of a satisfaction 
signal — we assign to each sub-formula ~ of y a Boolean signal wy, such that 
wy|t] = true iff (w,t) | y. For each STL operator, we define a method that 
computes its satisfaction signal from the satisfaction signals of its arguments. 
For some operators, this computation is trivial. For example, satisfaction signal 
wW-y is obtained by flipping the truth values of the satisfaction signal wọ. The 
computation of satisfaction signals for temporal operators is more involved. We 
give an intuition on the computation of wy where y = > and refer the reader 
to [19] for the technical description of the complete procedure. The computation 
is based on the following observation: whenever y holds throughout an interval 
J, w holds throughout (J © I) AOT, where JOI = {t-t’|te Jandt’ €I} 
is the Minkowski difference. Hence, the essence of the procedure is to back- 
shift (Minkowski difference restricted to T) all the positive intervals in w, and 
thus obtain the set of time points where rọ holds. This method is illustrated 
in Fig. 3. 


On P a |; | H 
0 2 4 6 8 10 12 


Fig. 3. Example of satisfaction signal computation for j,,2)p using back-shifting. 
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The integration of TRE into the monitoring procedure of xSTL is done in 
two steps. First, we define the match-set M(r, w) of a TRE over a signal w as the 
set of all segments of w that match r, i.e. M(r,w) = {(t,t’) | (w, t,t’) = r}, and 
use the algorithm of [22] to compute the match-set. We then use the match begin 
(@(r)) and match end ((r)@) operators to project the match-sets to satisfaction 
signals that are then directly integrated into the STL monitoring procedure 
described above. 

The algorithm proposed in [22] computes the set of segments of a signal 
w that match a TRE vy. Since we are dealing with continuous-time signals, the 
number of segments is non-countable and so is potentially the number of matches. 
The algorithm is based on the observation that all those segments can be can be 
embedded in two-dimensional space, inside the triangle 0 < t < t’ < |w|, where a 
point (t,t) represents the segment starting at t and ending in t’. The matching 
algorithm uses a symbolic representation of the matches as a finite union of two- 
dimensional zones. Zones are special class of convex polytopes which are defined 
as the conjunction of inequalities of the form x; < b; and x; — £j < Ci,j, where 
< € {<,<}. For instance, the match set M(e,w) for the empty word € is the 
diagonal zone {(t,t’) € T x T | t=1’}, while the match for a literal p or ap is a 
disjoint union of triangles touching the diagonal whose number depends on the 
number of switching points in wp. The match set of the time restriction operator 
is obtained by intersecting the match set with the corresponding diagonal band, 
hence M((y)7,w) = M(y)N{(t,t’) | t'—t € I}. The match sets for p and (p) 1,9) 


(a) (b) 


Fig. 4. Example of a match set - (a) p; and (b) (p)j1,9). 
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are depicted in Fig. 4. We point the reader to [22] for a complete description of 
the procedure. The satisfaction signals wa(r) and w(rj@ for the match-begin and 
match-end operators are computed from the match set of r by projecting every 
(t,t) € M(r) on t and t’, respectively. 


3.3 Trace Diagnostics for STL 


The trace diagnostics procedure implements the algorithm presented in [13]. 
Given an STL formula ọ and a trace w that violates y, the procedure gives an 
explanation of the fault in the form of a temporal implicant, which is a small sub- 
signal w’ of w which is sufficient to imply violation. In other words, any possible 
completion of w’ into a full signal will violate the property. The diagnostics 
procedure uses the satisfaction signals computed by the monitoring algorithm 
from Sect. 3.2 to explain the faults. The method uses the satisfaction explanation 
operator FE (and its dual violation explanation operator F) that for a given 
formula y returns an implicant of y (respectively of ~y) which is satisfied by w. 
The explanation operators are defined inductively on the structure of the formula 
y and on the times ¢ at which explanation of its sub-formulas are required. 

We illustrate the idea behind the procedure with the following example. 
Consider the STL specification y = © (0,uP a signal w in which p does not 
hold during [0,3) and then holds during [8,5). It is clear, for instance, that 
(w,0) K y and (w,3) = y. The violation of y by w at time 0 can be explained 
by the fact that w is continuously false throughout the interval [0,1]. In other 
words, we have that F(y, w,0) = Atejo,1)(Wplt] = false). In contrast, the value of 
w at any time t € [3,4] is sufficient to explain the satisfaction of y by w at time 
3. Thus E(y, w,3) could be any (wp[t] = true) such that t € [3,4]. We use the 
notion of a selection function to choose one explanation when there are many 
possible ones. The full algorithm is described in [13]. 


3.4 Specification-Driven Measurements 


In this section, we present a simple declarative measurement specification lan- 
guage [14] built on top of TRE. The idea is to require the signal segments over 
which measurements should be taken to be those that match some pattern speci- 
fied by an expression. An example of a measurement is the time elapsed between 
the beginning and end of some activity, or the total fuel consumption in a seg- 
ment where the acceleration pedal is continuously on until the velocity crosses 
some threshold. 

We first recall that the match set of a TRE defines all the trace segments 
that match the expression, and the number of those can be uncountably infinite. 
However if we restrict ourselves to patterns that are delimited by instantaneous 
discrete events, we will have only finitely many matches. Formally, we use the fol- 
lowing sub-class of expressions. An event-bounded TRE (E-TRE) is an expression 
of the form 

P:=Tp| Lp| fier: Pe | f1 U f2 | Nr 


with p a proposition, and f1, f2 event-bounded TRESs. 
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The measure patterns defining the segments to be measured are of the form 
a? y! B, where y is the main pattern, and a and 8 are, respectively pre- and 
post-conditions. The main pattern w specifies the portion of the signal over which 
the measure is taken. To guarantee a finite number of matching segments, w is 
restricted to be an E-TRE while a and £, which can be used to define additional 
constraints, are TREs. 

Given a measure pattern y and a signal w, we first compute all the segments 
of w that match y. We then apply a measuring operator that collects specific 
signal values over the matched segments. A measure is written with the syntax 
op(y) with op € {time, value,, duration, infz, sup,., integral,., average,,}. We finally 
aggregate the specific measures and provide to the user the minimum, maxi- 
mum and average measured value, as well as a histogram that summarizes the 
measurements. 

We illustrate specification-driven measurement with an example from the 
DSI3 automotive communication protocol [16]. The micro-controller and the 
sensors that use the protocol, communicate by sending analog pulses during the 
protocol initialization phase. The standard describes the acceptable shapes and 
duration of such pulses. Figure 5 depicts the specification of a discovery response 
pulse from the DSI3 standard. In particular, the standard defines the relevant 
thresholds (27 Resp and I Resp) which are used to describe the shape, as well as 
the acceptable duration of the pulse’s ramp (tı) and its total duration (t2). 

To define the pulse pattern we first define the following predicates: 


in =i > 2I Resp îi = IResp < i< 2IResp i; =i < IResp 


and then let 

g = i? T(in) : ly j th -tb : ICi) li. 
We finally apply the measure operation duration(p) to extract the duration of 
the segments that match the pulse pattern. 


Fig. 5. Discovery response pulse from DSI3. 


312 D. Niékovié et al. 


4 Examples 


In this section, we introduce two running examples that we use to illustrate the 
features and the functionalities of AMT 2.0. The first example is concerned 
with a mixed-signal bounded stabilization property and is used to illustrate the 
qualitative monitoring and trace diagnostics functionalities. The second example 
demonstrates the measurement functionality as applied to jitter in a digital clock. 


4.1 Mixed-Signal Bounded Stabilization 


Informal Requirements. This requirement states that after every rising edge 
of the Boolean trigger, the usually-stable analog signal var is allowed to oscillate 
under the following conditions: 


1. var must always remain below 5 V; and 
2. var must within 600s go below 0.2 V, and continuously remain under that 
threshold for at least 600s. 


Simulation Traces. We evaluate this requirement on 5 different simulation 
traces. Figure 6 depicts the Boolean trigger signal, as well as the 5 traces named 
var0 to var4. We can already reason informally about the satisfaction of the 
bounded stabilization property by these traces: 


trigger 


0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 
Time 


Fig. 6. Bounded stabilization - input signals. 
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1. Trace var0 violates the specification because the signal never stabilizes, i.e. 
it continues oscillating until the end of simulation; 

2. Trace var1 satisfies the specification - the signal always remains smaller then 

5V, and it goes below 0.2 V within 600s, continuously remaining below that 

threshold until the end of the simulation; 

Trace var2 violates the specification because the signal exceeds 5 V; 

4. Trace var violates the specification because the signal does not stabilize 
below 0.2 V within the specified period; and 

5. Trace var4 violates the specification because of the 3 glitches that occur 
towards the end of the simulation. 


9a 


Formal Specification in xSTL. To define the property we first declare the 
Boolean variable trigger, as well as the real variables var0 to var4. We also declare 
two constants vh and vl, representing the 5 V and 0.2 V thresholds, respectively. 
We note that we are evaluating the same formula over different signals. Hence, 
we define a generic property template stab for the bounded stabilization formula, 
which is the conjunction of conditions (1) and (2) of the informal requirements. 
The first conjunct says that the real-valued signal must be smaller than 5V. 
The second conjunct is a conditional formula that uses logical implication. It 
says that whenever the trigger signal is on its rising edge, the x signal must go 
below 0.2 V within 600s and continuously remain below that threshold for at 
least 300s. Then each assertion is an instantiation of the template with one of 
the signals var0 to varg. 


bool trigger; 
real vara; 


real vare; 
const real vh = 5; 
const real vl 0.23 


o YO on FR Ow ND bB 


template bool stabilization (bool tg, real x, real vhigh, 
real vlow) { 


9 bool result = ((x <= vhigh) and (rise(tg) -> (eventually 
[0:600] always[0:300] x <= vlow))); 

10 return result; 

u } 


13 assertion one: 
14 always (stabilization (trigger, vara, vh, vl)); 


te assertion five: 
17 always (stabilization(trigger, vare, vh, vl)); 


Qualitative Monitoring of the Specification. We illustrate the qualitative 
monitoring of the property applied to the traces as done using the GUI of the 
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tool. In the evaluation configuration window, we first specify the xSTL speci- 
fication, the simulation traces and an optional alias file. In addition to setting 
up the inputs, we also select the Float representation of the real numbers, the 
Linear interpolation and the Single Explanation feature of the diagnostics 
module. 

After evaluating the specification on the traces, we can visually depict the 
results, as shown in Fig. 1. The nodes in the xSTL parse tree view are expandable 
via a double click. By expanding the assertions node of the specification, we 
can see that assertion two is satisfied, while assertions one, three, four and five 
are violated. We note that we can visualize the satisfaction signals for any sub- 
property of the specification. 


Fault Explanation. The fault explanation is given in the form of temporal 
implicants which are (small) sub-segments of the input signals which are suffi- 
cient to imply the property violation. Figure 7 illustrates the visual output of the 
diagnostics procedure in AMT 2.0 for the bounded stabilization specification. 
The first two figures show the trace diagnostics report for the third assertion. We 
can see that the trigger signal does not contribute to the fault, but var? does at 
a single point in time within the interval [100,150]. At that time, var is greater 
than the invariant threshold 5 V which explains the property violation. The last 
two figures show that same report, but for the fifth assertion. In this case, the 
fault is explained by the fact that signal trigger gets high at time 100 and by 
the values of signal var4 at times 350, 600 and 750. We can see that the last two 
times coincide with the glitches, thus witnessing that var/ never continuously 
holds below 0.2 V for at least 300 time units. 

We note that the tool computes the fault explanations in a hierarchical man- 
ner, following the parse tree of the formula. This additional and complementary 
information can be quite useful in understanding the fault. We finally note that 
the trace diagnostics can be made hierarchic. 


4.2 Digital Clock Jitter 


Informal Requirements. Given a continuous-time Boolean-valued signal 
clock, a clock period is defined as a segment that starts with the rising edge 
of the clock and ends with its consecutive rising edge. The measurement speci- 
fication is to measure the duration of all the clock periods matched within the 
clock signal in order to assess the clock jitter. 


Simulation Trace. We apply the specifications to a Boolean clock signal, see 
Fig. 8. 


Formal Specification in xSTL. We now formalize the measurement specifi- 
cation for the digital clock jitter analysis in xSTL. We first declare the Boolean 
variable clock, as well as its negation nclock. We then specify the pattern 
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Fig. 8. Digital clock jitter - a segment of the input signal. 


clock_period that consists of concatenations that starts with the rising edge of 
clock (startclock), followed by an interval of positive duration where clock holds, 
followed by another interval of positive duration where nclock holds, and ending 
with the next rising edge of clock. Finally, we declare the actual measurement 
to be taken as duration(clock_period) which extracts the durations of all signal 
segments that match the clock_pattern pattern. 
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1 bool clock; 
2 bool nclock = not clock; 


s measurement jitter_clock_period { 


6 pattern clock_period = start(clock):clock:nclock: start ( 
clock); 

7 measure duration(clock_period); 

a } 

9 

1 measurement jitter_clock_period_c { 

tt pattern clock_period_c = start(clock) :{clock:nclock 
}[19000:21000]: start (clock); 

12 measure duration(clock_period_c); 

13 + 


Pattern-Driven Measurements. The visualization of the measurement spec- 
ification consists of a histogram depicting the distribution of the measures taken 
over signal segments that match the pattern, the total number of matched seg- 
ments, as well as the minimum, maximum and average value of the measures. 
The visual summary of the clock jitter measurement is shown in Fig. 9. 


duration (start(clock):clock:not(clock):start(clock)) 
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Total number of segments: 299 
Maximum value: 21877.0 
Minimum value: 17241.0 

Average value: 20002.046822742475 


Fig. 9. Digital clock jitter - measurements. 


5 Related Work 


Breach [11] is a MATLAB/Simulink toolbox that enables various types of STL 
specification analysis. In particular, Breach supports falsification-based testing, 
parameter synthesis and requirement mining of STL properties. S-TaLiRo [3] is 
another Simulink/MATLAB toolbox for different robustness analysis of MTL 
specifications. It provides support for falsification-based testing, parameter min- 
ing, runtime verification, conformance testing, computing the worst expected 
robustness for stochastic systems and debugging of formal requirements. The 
ViSpec [15] tool, associated with S-TaLiRo, allows visual specification of MTL 
requirements. BIOCHAM [10] is a tool for inferring unknown (biological) model 
parameters from temporal logic constraints. The authors in [9] extend STL with 
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freeze quantifiers that allow them to express oscillatory properties. Similar oscil- 
latory properties of the heart behavior are studied using quantitative regular 
expressions (QRE) in [1]. 

Montre [21] is a prototype tool for TRE pattern matching. It provides sup- 
port for both offline and online matching. AMT 2.0 implements the offline 
matching algorithms used by Montre and adds a specification measurement lan- 
guage on top of it. Montre does not provide support for STL, monitoring and 
trace diagnostics. 

The combination of STL and TRE was inspired by the Property Specifi- 
cation Language (PSL) [12] and SystemVerilog Assertions (SVA) [23] standards 
used in the digital hardware verification. Both PSL and SVA use the suffix impli- 
cation operator to combine temporal logic with regular expressions. In contrast, 
we define match begin and end operators that give us more freedom to decide 
whether the begin or the end of an expression match is relevant for the property. 
The only other work that combines temporal logic and the regular expressions 
in the context of continuous-time applications is presented in [8], where the 
authors propose the metric dynamic logic as the specification language for rea- 
soning about time-event sequences. 


6 Conclusion 


We introduced in this paper the AMT 2.0 tool for qualitative and quantitative 
analysis of traces coming from cyber-physical systems applications. The tool 
uses an expressive specification language based on a combination of STL and 
TRE and admits qualitative monitoring, trace diagnostics and property-driven 
measurements as its main functionalities. The development of the tool is a con- 
tinuous work in progress and there is a number of features which are planned 
to be developed in the near future, in particular solving the inverse problem 
of finding parameters in a formula template the lead to satisfaction by a given 
signal or a set of signals [6]. 
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Abstract. We provide an efficient algorithm for multi-objective model- 
checking problems on Markov decision processes (MDPs) with multiple 
cost structures. The key problem at hand is to check whether there exists 
a scheduler for a given MDP such that all objectives over cost vectors 
are fulfilled. Reachability and expected cost objectives are covered and 
can be mixed. Empirical evaluation shows the algorithm’s scalability. 
We discuss the need for output beyond Pareto curves and exploit the 
available information from the algorithm to support decision makers. 


1 Introduction 


Markov decision processes [41] (MDPs) with rewards or costs are a popular 
model to describe planning problems under uncertainty. Planning algorithms 
aim to find strategies which perform well (or even optimally) for a given objec- 
tive. These algorithms typically assume that a goal is reached eventually [41,45]. 
This however is unrealistic in many scenarios, e.g. due to insufficient resources 
or the possibility of failing actions. Furthermore, these policies often admit sin- 
gle runs which perform far below the user’s expectation, which is unsuitable 
in many scenarios with high stakes. Examples range from deliveries reaching an 
airport after the plane’s departure to more serious scenarios in e.g. wildfire man- 
agement [1]. In particular, many scenarios call for minimising the probability to 
run out of resources before reaching the goal: while it is beneficial for a plane to 
reach its destination with low expected fuel consumption, it is essential to reach 
its destination with the fixed available amount of fuel. 

Policies that optimise solely for the probability to reach a goal are mostly very 
expensive. Even in the presence of just a single cost structure, decision makers 
have to trade the success probability against the costs. This makes many plan- 
ning problems inherently multi-objective [12,17]. In particular, safety properties 
cannot be averaged out by good performance [21]. Planning scenarios in various 
application areas [44] have different resource constraints. Typical examples are 
energy consumption and time [11], or optimal expected revenue and time [38] in 
robotics, and monetary cost and available capacity in logistics [17]. 
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task time energy consumption scientific value prob. 


1 high {low, medium} medium 1/2 ` 
2 low {medium, high} medium 3/5 
3 low low low 4/5 
4 high high high 1/10 
(a) Different possible tasks for a Mars rover (b) Tradeoffs in costs 


Fig. 1. Science on Mars: planning under several resource-constraints 


Illustrative Example. Consider a simplified (discretised) version of the Mars rover 
task scheduling problem [11]. The task is to plan a variety of experiments for 
a day on Mars. The experiments vary in their success probability, time, energy 
consumption and their scientific value upon success. The time, energy consump- 
tion, and scientific value are uncertain and modelled by probability distributions, 
cf. Fig. 1(a). The objective is to achieve a minimum of daily scientific progress 
while limiting the risk of running out of time or out of energy. As the rover is 
expected to work for a longer period, we prefer a high expected scientific value. 


Contributions and approach. This paper focuses on multi-objective cost-bounded 
reachability queries on MDPs, a natural setting for the aforementioned plan- 
ning problems. The input is an MDP with multiple cost structures (e.g. energy, 
utility or time) and multiple objectives of the form “maximise/minimise the 
probability to reach a state in G; such that the cumulative cost for the i-th 
cost structure is below/above a threshold b;”. This multi-objective variant of 
cost-bounded reachability is PSPACE-hard [43]. The focus of this paper is on 
the practical side: we aim at finding a practically efficient algorithm to obtain 
(an approximation of) the Pareto-optimal points. To accomplish this, we adapt 
and generalise recent approaches for the single-objective case [27,34] towards 
the multi-objective setting. The basic idea of [27,34] is to implicitly unfold the 
MDP along cost epochs, and exploit the regularities of the epoch-MDPs. PRISM 
[37] and the MODEST TOOLSET [29] have been updated with such methods 
for the single-objective case and significantly outperform the explicit unfolding 
approach of [2,40]. This paper presents an algorithm that lifts this principle to 
multiple cost objectives and determines approximation errors when using value 
iteration. Extensions towards quantiles and expected costs are considered too. 
Evaluation using a prototypical implementation in STORM [20] shows promising 
results. In addition, we equip our algorithm with means to visualise (inspired by 
the recent techniques in [39]) the trade-offs between various objectives that go 
beyond Pareto curves; we believe that this is key to obtain better insights into 
multi-objective decision making. An example is given in Fig. 1(b): it depicts the 
probability to satisfy an objective based on the remaining energy (y-axis) and 
time (x-axis). 


Related work. The analysis of single-objective (cost-bounded) reachability in 
MDPs is an active area of research in both AI and formal method communities, 
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and referred to in, e.g., [18,35,48]. Various model checking approaches for sin- 
gle objectives exist. In [32], the topology of the unfolded MDP is exploited to 
speed up the value iteration. In [27], three different model checking approaches 
are explored and compared. A survey for heuristic approaches is given in [45]. 
A Q-learning based approach is described in [13]. An extension of this problem 
in the partially observable setting was considered in [14], and for probabilistic 
timed automata in [27]. The method from [4] computes optimal expected val- 
ues under e.g. the condition that the goal is reached, and is thus applicable in 
settings where a goal is not necessarily reached. A similar problem is consid- 
ered in [46]. For multi-objective analysis, the model checking community typi- 
cally focuses on probabilities and expected costs as in the seminal works [15,22]. 
Implementations are typically based on a value iteration approach in [24], and 
have been extended to stochastic games [16], Markov automata [42], and inter- 
val MDPs [28]. Other considered cases include e.g. multi-objective mean-payoft 
objectives [8], objectives over instantaneous costs [10], and parity objectives [7]. 
Multi-objective problems for MDPs with an unknown cost-function are con- 
sidered in [33]. Surveys on multi-objective decision making in AI and machine 
learning can be found in [44] and [47], respectively. 


2 Preliminaries 


We write 2° for the powerset of S. The i-th component of a tuple t = (v1,..., Un) 
is thi] “ v;i. A (discrete) probability distribution over a set Q is a function p € 
N — [0,1] such that support(u)  {w € Q | p(w) > 0} is countable and 
> wesupport(z) H) = 1. Dist(2) is the set of all probability distributions over 92. 
D(s) is the Dirac distribution for s, defined by D(s)(s) = 1. 


Definition 1. A Markov decision process (MDP) with m cost structures is a 
triple M = (S,T, Sinit) where S is a finite set of states, T € S — 2Dist(N™xS) 
is the transition function, and Sini E S is the initial state. For alls E€ S, we 
require that T(s) is finite and non-empty. 


We write s >r p for du € T(s) and call it a transition. We write s Srp s 
if additionally (c, s’) € support(j). (ec, s’) is a branch with cost vector e. If T 
is clear from the context, we just write —. Graphically, transitions are lines 
to a node from which branches labelled with their probability and costs lead 
to successor states. We may omit the node and probability for transitions into 
Dirac distributions. 


Example 1. Figure 2 shows an MDP Mez. From the initial state so, the choice of 
going towards sı or s2 is nondeterministic. Either way, the probability to return 
to so is 0.5, otherwise we move to sı (or s2). Mer has two cost structures: Failing 
to move to sı has a cost of 1 for the first, and 2 for the second structure. Moving 
to Sg yields cost 2 for the first and no cost for the second structure. 


In the remainder of this paper, we fix a given MDP M = (S,T, Sinit). Its 
semantics is captured by the notion of paths. A path in M represents the 


Multi-cost Bounded Reachability in MDP 323 


infinite concrete resolution of both nondeterministic and probabilistic choices: 
T = Sọ Ho Co $1 H1 C1... where si E€ S, si > py, and (Ci, Si+1) E€ support(u;) for 
all ¿ € N. A finite path Tfn = 89 Ho Co $1 H1 C1 $2 - . - Un—1 Cn—1 Sn is a finite prefix 
of a path with last(tin) = sn € S. Let costi(tin) = su c;[i]. Pathsgy (MZ) 
(Paths(M)) are the set of all (in)finite finite paths starting in Sini. A scheduler 
(adversary, policy or strategy) resolves nondeterministic choices: 


Definition 2. © € Pathsg,(M) — Dist(Dist(N™” x S)) is a scheduler for M if 
YVTfn: H E support(G(mgn)) > last(t—n) >r u. The set of all schedulers of M 
is Sched( M). G is deterministic if |support(G(7))| = 1 for all finite paths r. 


Via the standard cylinder set construction [25], a scheduler G induces a 
probability measure PŠ on measurable sets of paths starting from Sini. We 
define the extremal values PẸ* (II) = SUPeescnea(M) PŠ(II) and PEM) = 
inf 6 <sched(M) PS (II) for measurable JT C Paths(M). For clarity, we focus 
on probabilities in this paper, but note that expected accumulated costs 
can be defined analogously [25] and our methods apply to them with only 
minor changes. 


Cost-Bounded Reachability. We are interested in the probabilities of sets of 
paths that reach certain goal states within multiple cost bounds: 


Definition 3. A cost bound is given by (Cj).»G where j € {1,...,m} iden- 
tifies a cost structure, ~ € {<,<,>,>}, b E€ N is a bound value, and G C S 
is a set of goal states. A cost-bounded reachability formula is a conjunction 
MECO ae G;) of cost bounds. It characterises the measurable set of paths 
II where, for every i, every 7 € IT has a prefix n, with last(7,,) € Gi and 
cost ;; (Tin) ^Ni bi. 


A (single-objective) multi-cost bounded reachability query asks for P?7* (e) where 
opt € {max, min} and e is a cost-bounded reachability formula. Unbounded 
and step-bounded reachability are special cases of cost-bounded reachability. 
A single-objective query may contain multiple bounds, but asks for a single 
scheduler that optimises the probability of satisfying them all. 

We also consider multi-objective tradeoffs, i.e. sets of single-objective queries 
written as = multi(Pyp" (e1),..., Pap (ee)). We call the ex objectives. For 
tradeoffs, we are interested in the Pareto curve Pareto(M,®) which consists of 
all achievable probability vectors pg = (PS(e1),...,P(ee)) for G € Sched(M) 
that are not dominated by another achievable vector pe.. More precisely, pe € 
Pareto(M, ®) iff for all G’ € Sched(M) either pg = pe, or for some i € {1,..., 4} 
we have (opt; = max A peli] > pelil) V (opt; = min A pe [ti] < pe [i]). 


Example 2. We consider ® = multi (PREX ((C1) <1 {81}), PREX ((C2)<3 {82})) for 
Mes of Fig.2. Let G; be the scheduler that tries to move to sı for at 
most j attempts and afterwards moves to s2. The induced probability vectors 
Pe, = (0.5,1) and pg, = (0.75,0.75) both lie on the Pareto curve since no 
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(a) The naive approach (b) Cost epochs 


Fig. 2. Example MDP Mer Fig. 3. An illustration of epochs 


6 € Sched(Mes) induces (strictly) larger probabilities pg. By also consider- 
ing schedulers that randomise between the choices of G; and G2 we obtain 
Pareto(Mez,®) = {w: pe, + (1—w) - pe, | w € [0, 1}. 


For clarity of presentation, we restrict to tradeoffs ® where every cost structure 
occurs exactly once, i.e., the number m of cost structures of M matches the 
number of cost bounds occurring in &. Furthermore, we require that none of the 
sets of goal states contains the initial state. Both assumptions are w.l.o.g. by 
copying cost structures as needed and adding a new initial state with zero-cost 
transition to the old initial state. 


3 Multi-dimensional Sequential Value Iteration 


We present a practically efficient approach to compute (an approximation of) 
the Pareto curve for MDP M with m cost structures and tradeoff &. We 
merge the ideas of [24] to approximate a Pareto curve for an (unbounded) 
multi-objective tradeoff with those of [27,34] to efficiently compute (single- 
objective) cost-bounded reachability probabilities. For clarity of presentation 
we start with the upper-bounded maximum case and assume a tradeoff of the 
form ® = multi(PxP*(e1),-.., Pir *(ec)) with ep = NE Ge Gi) and 
0= no < ny <- <ne =m. Other variants are discussed in Sect. 3.3. 


Cost epochs and goal satisfaction. Central to our approach is the concept of 
cost epochs. Consider the path m = (59(2,0)s2(0,0)s9(1,2))” through Mer of 
Fig. 2. We plot the accumulated cost in both dimensions along this path in 
Fig. 3(a). Starting from (0,0), the first transition yields cost 2 for the first cost 
structure: we jump to coordinate (2,0). The next transition, back to so, has 
no cost, so we stay at (2,0). Finally, the failed attempt to move to sı incurs 
costs (1,2). Consequently, for an infinite path, infinitely many points in this grid 
may be reached. However, a tradeoff specifies bound values for the costs, e.g., 
for Pes = multi (PRX ((C1) <a {81}), PREX ((C2)<3 {s2})) we get bound values 4 
and 3. Once the bound value for a bound is reached, accumulating further costs 
in this dimension does not impact the satisfaction of its formula. It thus suffices 
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to keep track, for each bound, of the remaining costs before reaching the bound 
value. This leads to a finite grid as depicted in Fig. 3(b). We refer to each of its 
coordinates as a cost epoch: 


Definition 4. An m-dimensional cost epoch is a tuple in Em & (NU{L})™ 
For e € Em, c € N”, the successor epoch is succ(e,c)[i] = efi] — cli] if that 
value is non-negative and L otherwise. 


If the entry for a bound is L, it cannot be satisfied any more: too much costs have 
already been incurred. To check whether an objective ep = Nien CoS Gi) 
is satisfied, we memorise whether each individual bound already holds. This is 


also used to ensure that satisfying a bound more than once has no effect. 


Definition 5. A goal satisfaction g € Gm © {0,1}™ represents the cost struc- 
ture indices i for which bound (C;i)<»; Gi already holds, i.e. G; was reached before 
the bound value b;. For g € Gm, e € Em and s € S, let succ(g,s,e) E Gm 
define the update upon reaching s: succ(g,s,e)[t] = 1 if s € G; A efi] A L and 
succ(g, 8, €)|[?] = gli] otherwise. 


3.1 The Unfolding Approach 


Pareto(M,®) can be computed by reducing © to a multi-objective unbounded 
reachability problem on the unfolded MDP. Its states are the Cartesian product 
of the original MDP’s states, the epochs, and the goal satisfactions: 


Definition 6. The unfolding for M as in Definition 1 and upper-bounded 
maximum tradeoff Ð is the MDP Mung = (S' 2 S x Em X Gm, T', (Sinit, 
(b1,...,0m),0)) with no cost structures, T'((s,e,g)) = {unf(u) € 
Dist (N° x S’) | u € T(s)} and the unfolding of probability distribution u defined 
by unf (w)(((s’, e’,g’))) = ulle, 8’)) if e' = succ(e,c) Ag’ = succ(g, s’,e’) and 0 
otherwise. 


Costs are now encoded in the state space, so it suffices to consider the unbounded 
tradeoff P = aie (PRES, (e1). rain) with e, = (-)s0Gi, and G, = 


{(s,e,9) | Nizn,_, Il = 1. 


Lemma 1. There is a bijection f: Sched(M) — Sched(Myny) with PẸ (ex) = 
P (e},) for all G € Sched(M) and k € {1,...,4}. Consequently, we have that 
Pareto(M,®) = Pareto( Mung, P’). 


Pareto( Mun, P’) can be computed with existing multi-objective model checking 
algorithms for unbounded reachability. We build on the one of [24]. It iteratively 
chooses weight vectors w = (w1,...,we) € [0,1]* \ {0} and computes points 


i £ 
= (P(E IY oeo P Minle e,)) with G € arg maxe( Z 


k=1 


Wk * Pus, (a): 
(1) 
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The Pareto curve P is convex, p,, € P for all w, and q € P implies q-w < p,,-w 
These observations allow us to approximate the Pareto curve with arbitrary 
precision; see [24] for details. [24] characterises p,, via weighted expected costs: 
Mung is equipped with £ cost structures used to calculate the probability of each 
of the £ objectives. This is achieved by setting the value of the k-th cost structure 
on each branch to 1 iff the objective e}, is satisfied in the target state of the branch 
but was not satisfied in the transition’s source state. On a path m through the 
resulting model Ming we collect exactly one cost w.r.t. cost structure k iff a 
satisfies objective ex. 


Definition 7. For © € Sched(M7,,) and w € [0,1], the weighted expected 
cost is Ext, (w) = 5$ w[k] - JrePaths(a) COSte(™ T)AP i+, (T), ie. the expected 


value of the weighted sum of the costs accumulated on pan in Ming 


The following characterisation of p,, is equivalent to Eq. 1: 


Pw = (Cs (11),---;ES4 (12)) where © € arg maxg Ey (W (w) (2) 


and 1, € {0,1} is the weight vector defined by 1,[j] = 1 iff j = k. Standard 
MDP model checking algorithms [41] can be applied to compute an optimal 
(deterministic and memoryless) scheduler G and the induced costs ES wt, (Le): 


unf 


3.2 An Epoch Model Approach Without Unfolding 


The unfolding approach does not scale well: If the original MDP has n states, 
the unfolding will have on the order of n - Į [;-;(b: + 2) states. This makes it 
infeasible for larger bound values b; over multiple bounds. The bottleneck lies 
in computing the points p,, as in Eqs.1 and 2. We now show how to do so 
efficiently, i.e. given a weight vector w = (w1, ..., we) € [0, 1]* \ {0}, compute 


Pw = (Pi le1), ---, Pur (ee)) with 6 € arg maxe(%,_, wi- PÈ ii ex)) (3) 


without unfolding. The characterisations of p,, given in Eqs. 1 and 3 are equiv- 
alent due to Lemma 1. 

The efficient analysis of single-objective queries with a single bound ı = 
PRP ((C)<» G) has recently been addressed in e.g. [27,34]. The key observation 
is that the unfolding Mung can be decomposed into b + 2 epoch model MDPs 
M?,...,M°, M+ corresponding to the cost epochs. The epoch models are copies 
of M with only slight adaptations. Reachability probabilities in copies corre- 
sponding to epoch į only depend on the copies { MÍ | j < i V j = 1}. It is thus 
possible to analyse M+,..., M? sequentially instead of considering all copies at 
once. In particular, it is not necessary to construct the full unfolding. 

We lift this idea to multi-objective tradeoffs. The single-objective case is 
notably simpler in that reaching a goal state for the first time or exceeding 
the cost bound immediately suffices to determine whether the one property is 
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Fig. 4. An epoch model of Mes 


satisfied. In particular, while M+ is just one sink state in the single-objective 
case, its structure is more involved here. 

We first formalise the notion of epoch models for multiple bounds. The aim 
is to build an MDP for each epoch e € Em that can be analysed via standard 
model checking techniques using the weighted expected cost encoding of objec- 
tive probabilities. The state space of an epoch model consists of up to one copy 
of each original state for each goal satisfaction vector g E€ Gm. Additional sink 
states (s,,g) encode the target for a jump to any other cost epoch e’ # e. 
We consider £ cost structures to encode the objective probabilities. Let function 
satObjg: Gm X Gm — {0,1}* assign value 1 in entry k iff a reachability property 
ex is satisfied according to the second goal vector but was not satisfied in the 
first. For the transitions’ branches, we distinguish two cases: (1) If the successor 
epoch e’ = succ(e,c) with respect to the original cost c € N™ is the same as 
the current epoch e, we jump to the successor state as before, and update the 
goal satisfaction. We collect the new costs for the objectives if updating the goal 
satisfaction newly satisfies an objective as given by satObjg (2). If the successor 
epoch e’ = succ(e,c) is different from the current epoch e, the probability is 
rerouted to the sink state with the corresponding goal state satisfaction vector. 
The collected costs contains the part of the goal satisfaction as in (1), but also 
the results obtained by analysing the reached epoch e’, given by a function f. 


Definition 8. The epoch model of MDP M as in Definition 1 fore € Em and 
a function f: Gm x Dist(N™ x S) — [0,1] is the MDP My = nye, (Sinit, O)) 
with l cost structures, S€ = (S W s1) x Gm, T¢((81,9)) = { D((O, (s1,9))) }; 
and for every 5 = (s,g) € S° and u € T(s), there is some v € TẸ (8) defined by: 


1. v((satObj 6(g,9'), (s’,g’))) = ple, s’) if succ(e,c) = eAg’ = succ(g,s’,e), 
and 


2. v(( f(g, u) +satObj5(g9,9'), (81,9'))) = Yeeec Xes: (c, s’) where C = {c | 
succ(e,c) # e} and S|, = {s' | succ(g, s', succ(e,c)) = g’}. 


Figure 4 shows an epoch model Mọ of the MDP Mez in Fig. 2 with respect to 
tradeoff ® as in Example 2 and any epoch e € E, with e{1] # L and e[2] # L. 
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Input : MDP M = (S,T, Sinit), tradeoff P = multi(Pyy*(e1),...,Pa*(ec)) 
with bound values b1,...,b0m, weight vector w € [0,1] and proper 
epoch sequence E ending with last(E) = (b1,..., bm) 

Output : Point p,, € R° satisfying Eq. 3 


1 foreach e € E in ascending order do 

2 foreach g € Gm, u € {v | ds: v E€ T(s)} do 
3 z<-0O 

4 foreach (c, s’) € support(u) do 

5 e’ — succ(e,c); g’ — succ(g, 5’, e’) 
6 if e’ 4e then 

7 | z= z+ ules) a sg) 

8 | f(g“) — 2 

9 build epoch model MF = (S°, TF, Sinit) 
10 G — arg maxes’ Ee (w) 
11 foreach k € {1,...,4}, 5 € S° do 
12 | 2° [SILA] — Exre (14) (5) 


last Œ] 


13 return gt [slast 


Algorithm 1. Sequential multi-cost bounded analysis 


Remark 1. The structure of Mọ differs only slightly between epochs. In partic- 
ular consider epochs e, e’ with eļi] = L iff e'|i] = L. To construct epoch model 
Me from Mf, only transitions to the bottom states (s1,g) need to be adapted. 


To analyse an epoch model M®€, any successor epoch e’ of e needs to be analysed 
before. Since costs are non-negative, we can ensure this by analysing the epochs 
in a specific order. In the single dimensional case the order is uniquely given 
by L,0,1,...,6. For multiple cost bounds any linearisation of the partial order 
< C Em x Em with e’ < e iff e'fi] < efi] v e'[i] = L for all į can be considered. 
We call such a linearisation a proper epoch sequence. 

We compute the points p,, by analysing the different epoch models (i.e. 
the coordinates of Fig. 3(b)) sequentially. The main procedure is outlined in 
Algorithm 1. The costs of the model for the current epoch are computed in 
lines 2-8. These costs comprise the results from previously analysed epochs e’. 
In lines 9-12, the current epoch model Mf is built and analysed: We compute 
weighted expected costs on Mf where EMs (w)[s] denotes the expected costs 


for Mẹ when changing the initial state to s. In line 10 a (deterministic and 
memoryless) scheduler G that induces the maximal weighted expected costs 
(i.e. EMe (w)[s] = maxe’ EMs (w)[s] for all states s) is computed. In line 12 we 
then compute the expected costs induced by G for the individual objectives. 


Theorem 1. The output of Algorithm 1 satisfies Eq. 3. 


Proof (sketch). Let e be the currently analysed epoch. Since E is assumed to be 
a proper epoch sequence, we already processed any reachable successor epoch e’ 
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of e, i.e., line 7 is only executed for epochs e’ for which x€ has already been 
computed. One can show that the values x°[(s,g)|[k] computed by the algorithm 
coincide with the probability to satisfy e/, from state (s,e,g) in the unfolding 
Mung under a scheduler G that maximises the weighted sum. 


Error propagation. So far, we assumed that (weighted) expected costs ES (w) 
are computed exactly. Practical implementations, however, are often based on 
numerical methods that only approximate the correct solution. In fact, methods 
based on value iteration—the de-facto standard in MDP model checking—do 
not give any guarantee on the accuracy of the obtained result [26]. We therefore 
consider interval iteration [5,9] which for a predefined precision € > 0 guarantees 
that the obtained result x, is e-precise, i.e. we have |r, — ES (w)[s]| < e. 

For the single-cost bounded variant of Algorithm 1, [27] discusses that in 
order to compute P¥;°*((C)<, G) with precision £, each epoch model needs to 
be analysed with precision f7. We generalise this result to multi-dimensional 
tradeoffs. Assume the results of previously analysed epochs (given by f) are €- 
precise and that M? is analysed with precision ô. As in the single-dimensional 
case, the total error for MẸ can accumulate to ð+ e. Since a path through the 
MDP M can visit at most >," (bi + 1) cost epochs whose analysis introduces 


error 6, the overall error can be upper bounded by ô- 37", (b; + 1). 


Theorem 2. If the values x®[8][k] at line 12 of Algorithm 1 are computed with 
precision €/ X`; (bi+1) for some £ > 0, the output pl, of the algorithm satisfies 
Pw — Pu |: w < € where p, is as in Eq. 8. 


Remark 2. Alternatively, epochs can be analysed with the desired overall preci- 
sion € by lifting the results from topological interval iteration [5]. However, that 
requires to store the obtained bounds for the results of already analysed epochs. 


3.3 Extensions 

Minimising objectives. Objectives Pi'"(ex) can be handled by extending the 
function satObjg in Definition 8 such that it assigns cost —1 to branches that lead 
to the satisfaction of eg. To obtain the desired probabilities we then maximise 
negative costs and multiply the result by —1 afterwards. As interval iteration 
supports mixtures of positive and negative costs [5], arbitrary combinations of 
minimising and maximising objectives can be considered. 


Beyond upper bounds. Our approach also supports bounds of the form (Cj). G 
for ~ € {<, <, >, >}, i.e., we allow combinations of lower and upper cost-bounds. 
For strict upper bounds <6 and non-strict lower bounds > b we consider < 6+ 1 
and > b—1 instead. For bound (C;) >p, G; we adapt the update of goal satisfactions 
such that succ(g, s,e)[i] = 1 if either gļi] = 1 or s € G; A eļi] = L. Similarly, we 
support multi-bounded-single-goal queries of the form (C(j,,....j,,)) (W1b1j....~nbn) G 
which characterises the paths m with a single prefix nfn satisfying last(7™n) € G 
and all cost bounds, i.e., cost ;, (Tn) ~i bi. 


1 This supersedes a restriction of the algorithm of [24]. 
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(a) Pareto curve for multi(0bj 199, 063149) (b) Optimal schedulers for 3 objectives 


Fig. 5. Pareto curves 


Example 3. The formula e = (C(1,1))(<1,>1) G expresses the paths that reach G 
while collecting exactly one cost w.r.t. the first cost structure. This formula is 
not equivalent to e’ = (C1)<1 GA (C1)>1 G since, e.g., for G = { so } the path 
T = so(2}so satisfies e’ but not e. 


Expected cost objectives. We can consider cost-bounded expected cost objec- 
tives E°P"(R,,, (Cj,) <p) with opt € { max, min } which refer to the expected cost 
accumulated for cost structure jı within a given cost bound (C,,)<,. Similar to 
cost-bounded reachability queries, we compute cost-bounded expected costs via 
computing (weighted) expected costs within epoch models. 


Quantiles. A (multi-dimensional) quantile has the form Qu(P?*(e) ~ p) for 
opt € { min, max}, ~ E€ {<,<,>,>},e= NCC eit Gi) and a fixed prob- 
ability threshold p € [0,1]. The quantile asks for F set of bound values B that 
satisfy the probability threshold, i.e., B = {(b1...,bn) | PoP’ (e) ~ p}. The com- 
putation of quantiles for single- Gost bounded reachability has been discussed 
in [3,34], where multiple cost bounds are supported via unfolding. Unfolding 
requires to fix bound values 62,...,0, a priori, and one can only ask for all bı 
that satisfy the property. Our approach provides the basis for lifting the ideas 
of [3,34] to multi-bounded queries. Roughly, one extends the epoch sequence 
č in Algorithm 1 dynamically until the epochs in which the bounded reacha- 
bility probability passes the threshold p are explored. Additional steps such as 
detecting the case where B = 9) are left for future work. 


4 Visualisations 


The results of a multi-objective model checking analysis are typically presented 
as a single (approximation of a) Pareto curve. For more than two objectives, the 
performance of the Pareto-optimal scheduler can be displayed in a bar chart as 
in Fig. 4, where the colours reflect different objectives and the groups different 
schedulers. The aim is to visualise the tradeoffs between the different objectives 
such that the user can make an informed decision about the system design or 
pick a scheduler for implementation. However, Pareto set visualisations alone 
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(a) Remaining scientific value requirement and the probabilities of the two 
objectives 
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Fig. 6. Two-dimensional plots of Pareto-optimal schedulers for different quantities 
(Color figure online) 


may not provide sufficient information, about, e.g., which objectives are aligned 
or conflicting (see e.g. [39] for a discussion in the non-probabilistic case). Cost 
bounds furthermore add an extra dimension for each cost structure. Consider 
the Mars rover MDP M, and tradeoff multi (0bj 499, 0bj140) with 


obj, = Py. ((Ctime) <175 B A (Cenergy) <100 B x (Cuatue) >v B) 


where B is the set of states where the rover has safely returned to its base. 
We ask for the tradeoff between performing experiments of scientific value at 
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least 100 before returning to base within 175 time units and maximum energy 
consumption of 100 units (0bj199) vs. achieving the same with scientific value 
at least 140 (0bj 149). The Pareto curve (Fig. 5(a)) shows the tradeoff between 
achieving 06749, and 0bj149. However, for each Pareto-optimal scheduler, our 
method has implicitly computed the probabilities of the two objectives for all 
reachable epochs as well, i.e. for all bounds on the three quantities below the ones 
required in the tradeoff. We visualise this information for deep insights into the 
behaviour of each scheduler, its robustness w.r.t. the bounds, and its preferences 
for certain objectives depending on the remaining budget for each quantity. 

We use plots as shown in Fig. 6. They can be generated in no extra runtime or 
memory since all required data is already computed implicitly. We restrict to two- 
dimensional plots since they are easier to grasp than complex three-dimensional 
visualisations. In each plot, we can thus show the relationship between three dif- 
ferent quantities: one on the x-axis (x), one on the y-axis (y), and one encoded as 
the colour of the points (z, where we use blue for high values, red for low values, 
black for probability zero, and white for unreachable epochs). Yet our example 
tradeoff already contains five quantities: the probability for 0b7 19), the proba- 
bility for 0bj,49, the available time and energy to be spent, and the remaining 
scientific value to be accumulated. We thus need to project out some quantities. 
We do this by showing at every (z, y) coordinate the maximum or minimum value 
of the z quantity when ranging over all reachable values of the hidden costs at 
this coordinate. That is, we show a best- or worst-case situation, depending on 
the semantics of the respective quantities. 

Out of the 30 possible combinations of quantities for our example, we show- 
case three to illustrate the added value of the obtained information. First, in 
Fig. 6(a), we plot the probabilities of the two objectives vs. the minimum sci- 
entific value that still needs to be accumulated for two different Pareto-optimal 
schedulers (left: G1, right: G2). White areas indicate that no epoch for the 
particular combination of probabilities is reachable from the tradeoff’s bounds. 
These two and all other Pareto-optimal schedulers are white above the diagonal, 
which means that 0bj199 implies 0bj 149, ie. the objectives are aligned. For the 
left scheduler, we further see that all blue-ish areas are associated to lower prob- 
abilities for both objectives. Since blue indicates higher values, this scheduler 
achieves only low probabilities when it still needs to make the rover accumu- 
late a high amount of value. However, it overall achieves higher probabilities for 
0bj 449 at medium value requirements, whereas the right scheduler is “safer” and 
focuses on satisfying 067199. The erratic spikes on the left occur because some 
probabilities are only reached after very unlikely paths. 

In Fig.6(b), we show for G; the probability to achieve 0bj1 9) depending 
on the remaining energy to be spent vs. the remaining scientific value to be 
accumulated. We see a white vertical line for every odd x-value; this is because, 
over all branches in the model, the gcd of all value costs is 2. The left plot shows 
the minimum probabilities over the hidden costs, i.e. we see the probability for 
the worst-case remaining time; the right plot shows the best-case scenario. Not 
surprisingly, when time is low, only a lot of energy makes it possible to reach 
the objective with non-zero probability. 
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Table 1. Runtime comparison for multi-cost single-objective queries 
Benchmark instance Interval It Policy It. 
Case Study |S| IT] |r-m |E| |Sung| UNF-dd | UNF-sp | SEQ | UNF-sp SEQ 
Service 38] | 8-104 | 2-105 | 1-1 162 |6-10° | 47 136 10 |1945 48 
JobSched2 |[34] 349 |660 |2-2 |503 /2-104 <1 <1 <1 |1 <1 
JobSched3 4584 | 1-105 | 2-2 |922 |3-106 4 10 4 26 13 
JobSched5 1-10® | 4-106 | 2-2 |2114 | 4-108 2944 |TO 3220 | TO TO 
FireWire | [36] 776 |1411 |2-2 |6024 |7-105 |7 8 2 274 144 
FireWire 776 |1411 |2-2 |1-105|1-107 |165 147 45 |TO 2803 
Resources |[6] |94 |326 |3-3 | 2-104|6-105 | <1 18 5 46 9 
Resources 94 |326 |3-3  1-107|6-108 TO TO 2693 | TO TO 
Rover 16 |30 |3-3 |9-104}1-10% | 38 24 4 704 106 
Rover 16 |30 |3-3 1-107 |2-108 TO 6040 713 | TO TO 
UAV [23] 1-105 |6-104|1-1 52 |4-104 1 1 4 27 
UAV 1-10° | 6-104 | 1-1 102 | 4-105 16 2 72 46 
Wlan3 [36] 1-105 | 2-105) 1-1 |82 | 3-108 63 8 126 800 
Wlan3 1-10° | 2-105 | 1-1 202 |1-107 820 293 14 | 848 2155 
Wlan6 5-106 | 1-107} 1-1 82 |2107 |12 363 989 |643 TO 
Wlan6 5-106 | 1-107 | 1-1 202 |6-108 |2292 TO 1399 | TO TO 

Table 2. Runtime comparison for multi-cost multi-objective queries 
Benchmark instance Interval It Policy It. 
Case Study |S| |T| | @r-m |E|  #w | |Sung|| UNF-sp SEQ | UNF-sp SEQ 
Service 8-107 2-107 | 2-1-2 162 34 |6-10°|1918 543 |TO 4679 
JobSched2 |349 660 2-4-4 4-104 2 |1-10° |3 54 | 15 183 
JobSched3 | 4584 1-10° 2-4-4 |1-10° 35 |2-10° |96 TO |6239 TO 
JobSched5 | 1-10° 4-10° | 2-4-4 (3-105? |? TO TO |TO TO 
FireWire |776 |1411/| 2-2-2 |6024/3  |7-10° |32 17 |TO 1159 
FireWire 776 |1411 2-2-2 1-10°}/2 |1-10" |863 225 |TO TO 
Resources 94 326 2-3-4 /2-10° 3 |6-10° | 25 16 |2047 |52 
Resources 94 326 |2-3-4 |1-1081? ? TO TO |TO TO 
Rover 16 30 2-3-3 9-10° 7 1-10° | 177 39 5817 3328 
Rover 16 30 23-3 1-108/7 |2-108 | TO 5785|TO TO 
UAV 1-10° 6-104 2-1-2 52 |18 |4-10* |2 24 |102 1098 
UAV 1-10° 6-10 2-1-2 102 |22 |4-10 |70 39 |2282 3062 
Wlan3 1-10° 2-10ř 3-1-2 82 |68 /|3-10° |5239 2231|TO TO 
Wlan3 1-10° |2-10° 3-1-2 202 |4 |1-107 |1769 185 |TO TO 
Wlan6 5-10° 1-10" | 3-1-2 82 |? |2-10’ |TO TO |TO TO 
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Finally, Fig. 6(c) shows the probability for 0bj 49 depending on available time 
and energy for G2. We plot the minimum probability over the hidden scientific 
value requirement, i.e. a worst-case view. The plot shows that time is of little 
use in case of low remaining energy, but it helps significantly when there is 
sufficient energy, too. In Fig. 6(d), we depict for the same scheduler the minimum 
remaining scientific value (z) under which a certain probability for 0bj 99 can be 
achieved (y), given a certain remaining time budget (x). The upper left corner 
shows that a high probability in little time is only achievable if we need to collect 
little more value; the value requirement gradually relaxes as we aim for lower 
probabilities or have more time. 


5 Experiments 


Implementation. We implemented the presented approach into STORM [20] v1.2, 
and available via [19]. The implementation computes extremal probabilities for 
single-objective multi-cost bounded queries, as well as Pareto curves for the 
multi-objective case. We consider the sparse engine of STORM, i.e., explicit data 
structures such as sparse matrices. For single-cost bounded properties, this has 
already been addressed in [34]. For the computation of expected cost (Lines 10 
to 12 of Algorithm 1) we employ interval iteration with finite precision floats 
as well as policy iteration with infinite precision rationals. The expected costs 
(lines 10 to 12 of Algorithm 1) are computed either numerically (via interval 
iteration over finite precision floats) or exactly (via policy iteration over infinite 
precision rationals). To reduce the memory consumption, the analysis result of 
an epoch model M¥ is erased as soon as possible. 
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(a) Wlan6 (single-obj.) (b) Rover (multi-obj.) (c) Resources (multi-obj.) 


Fig. 7. Runtime (y-axis) of SEQ (+) and UNF (x) for increasing cost bounds (x-axis) 


Set-up & reproduction. We evaluate the approach on wide range of case studies, 
available in the artefact [30]. The models are given in PRISM’s [37] guarded com- 
mand language. For each case study we consider single- and multi-objective 
queries that yield non-trivial results, i.e., probabilities strictly between zero 
and one. We compare the naive unfolding approach (UNF) as in Sect.3.1 with 
the sequential approach (SEQ) as in Sect.3.2. The unfolding of the model is 
applied on the PRISM language level, by considering a parallel composition with 
cost counting structures. On the unfolded model we apply the algorithms for 
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unbounded reachability as available in STORM. We considered precision 7 = 1074 
for the Pareto curve approximation and precision € = 107° for interval iteration. 
We increased the precision for single epoch models as in Theorem 2. 

We ran our experiments on a single core (2 GHz) of a HP BL685C G7 system 
with 192GB of memory. We stopped each experiment after a time limit of 2 
hours. For experiments that completed within the time limit, we observed a 
memory consumption of up to 36GB for UNF and up to 5GB for SEQ. 

A binary equivalent to the binary we used for the experiments is available in 
the artefact [30]. The binary has been tested in the artefact evaluation VM [31]. 
For other configurations, STORM should be recompiled using the sources [19]. 

Details on reproduction of the tables, as well as details on how to anal- 
yse multi-cost bounded properties using STORM in general can be found in the 
readme, enclosed in the artefact. 


Experimental Results. Tables 1 and 2 show results for single- and multi-objective 
queries, respectively. The first columns yield the number of states and transi- 
tions of the original MDP, then for the query, the number of bounds m, the 
number of different cost structures r, and the number of reachable cost epochs 
(reflecting the magnitude of the bound values). |Sung| denotes the number of 
reachable states in the unfolding. For multi-objective queries, we additionally 
give the number of objectives and the number of analysed weight vectors w. 
The remaining columns depict the runtimes of the different approaches in sec- 
onds. For UNF, we considered both the sparse (sp) and symbolic (dd) engine of 
STORM. The symbolic engine neither supports multi-objective model checking 
nor exact policy iteration. 

On the majority of benchmarks, SEQ performs better than UNF. Typically, 
SEQ is less sensitive to increases in the magnitude of the cost bounds, as illus- 
trated in Fig. 7. For three benchmark and query instances, we plot the runtime 
of both approaches against different numbers |E| of reachable epochs. While for 
small cost bounds, UNF is sometimes even faster compared to SEQ, SEQ scales 
better with increasing |E]. It is not surprising that SEQ scales better, ultimately, 
the increased state space and the accompanying memory consumption in UNF 
is a bottleneck. The most important reason that UNF performs better for some 
(smaller) cost bounds is the induced overhead of checking the full epoch. In par- 
ticular, the epoch contains (often many) states that are not reachable from the 
initial state (in the unfolding). 


6 Conclusion 


Many real-world planning problems consider several limited resources and con- 
tain tradeofts. This paper present a practically efficient approach to analyse these 
problems. It has been implemented in the STORM model checker and shows sig- 
nificant performance benefits. The algorithm implicitly computes a large amount 
of information that is hidden in the standard plots of Pareto curves shown to 
visualise the results of a multi-objective analysis. We have developed a new set of 
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visualisations that exploit all the available data to provide new and clear insights 
to decision makers even for problems with many objectives and cost dimensions. 


Data Availability Statement. The datasets analysed during the current 
study, and the binary used for the analysis, are available in the figshare reposi- 
tory [30]. Source code matching the binary is available in [19]. 
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Abstract. Statistical model checking avoids the state space explosion 
problem in verification and naturally supports complex non-Markovian 
formalisms. Yet as a simulation-based approach, its runtime becomes 
excessive in the presence of rare events, and it cannot soundly analyse 
nondeterministic models. In this tool paper, we present modes: a sta- 
tistical model checker that combines fully automated importance split- 
ting to efficiently estimate the probabilities of rare events with smart 
lightweight scheduler sampling to approximate optimal schedulers in non- 
deterministic models. As part of the MODEST TOOLSET, it supports a 
variety of input formalisms natively and via the JANI exchange format. 
A modular software architecture allows its various features to be flexibly 
combined. We highlight its capabilities with an experimental evaluation 
across multi-core and distributed setups on three exemplary case studies. 


1 Introduction 


Statistical model checking (SMC [80,49]) is a formal verification technique for 
stochastic systems. Using a formal stochastic model, specified as e.g. a continuous- 
time Markov chain (CTMC) or astochastic Petri net (SPN), SMC can answer ques- 
tions such as “what is the probability of system failure between two inspections” or 
“what is the expected time to complete a given workload”. It is gaining popularity 
for complex applications where traditional exhaustive probabilistic model checking 
is limited by the state space explosion problem and by its inability to efficiently han- 
dle non-Markovian formalisms or complex continuous dynamics. At its core, SMC 
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is the integration of classical Monte Carlo simulation with formal models. By only 
sampling concrete traces of the model’s behaviour, its memory usage is effectively 
constant in the size of the state space, and it is applicable to any behaviour that 
can effectively be simulated. 

The result of an SMC analysis is an estimate ĝ of the actual quantity q together 
with a statistical statement on the potential error. A typical guarantee is that, with 
probability 6, any g will be within + e€ of q. To strengthen such a guarantee, i.e. 
increase 6 or decrease €, more samples (that is, simulation runs) are needed. Com- 
pared to exhaustive model checking, SMC thus trades memory usage for accuracy or 
runtime. A particular challenge lies in rare events, i.e. behaviours of very low prob- 
ability. Meaningful estimates need a small relative error: for a probability on the 
order of 107 1°, for example, e should reasonably be on the order of 10~7°. In a stan- 
dard Monte Carlo approach, this would require infeasibly many simulation runs. 

SMC naturally works for formalisms with non-Markovian behaviour and com- 
plex continuous dynamics, such as generalised semi-Markov processes (GSMP) 
and stochastic hybrid Petri nets with many generally distributed transitions [42], 
for which the exact model checking problem is intractable or undecidable. As a 
simulation-based approach, however, SMC is incompatible with nondeterminism. 
Yet (continuous and discrete) nondeterministic choices are desirable in formal 
modelling for concurrency, abstraction, and to represent absence of knowledge. 
They occur in many formalisms such as Markov decision processes (MDP) or 
probabilistic timed automata (PTA [88]). In the presence of nondeterminism, 
quantities of interest are defined w.r.t. optimal schedulers (also called policies, 
adversaries or strategies) that resolve all nondeterministic choices: the verifica- 
tion result is the maximum or minimum probability or expected value ranging 
over all schedulers. Many SMC tools that appear to support nondeterministic 
models as input, e.g. PRISM [37] and UPPAAL SMC [14], implicitly use a single 
hidden scheduler by resolving all choices randomly. Results are thus only guaran- 
teed to lie somewhere between minimum and maximum. Such implicit resolutions 
are a known problem affecting the trustworthiness of simulation studies [36]. 

In this paper, we present a statistical model checker, modes, that addresses 
both of the above challenges: It implements importance splitting [45] to efficiently 
estimate the probabilities of rare events and lightweight scheduler sampling [39] 
to statistically approximate optimal schedulers. Both methods can be combined 
to perform rare event simulation for nondeterministic models. 


Rare Event Simulation. The key challenge in rare event simulation (RES) is 
to achieve a high degree of automation for a general class of models. Current 
approaches to automatically derive the importance function for importance split- 
ting, which is critical for the method’s performance, are mostly limited to restricted 
classes of models and properties, e.g. [7,18]. modes combines several importance 
splitting techniques with the compositional importance function construction of 
Budde et al. [5] and two different methods to derive levels and splitting factors [4]. 
These method combinations apply to arbitrary stochastic models with a partly 
discrete state space. We have shown them to work well across different Marko- 
vian and non-Markovian automata- and dataflow-based formalisms [4]. We present 
details on modes’ support for RES in Sect.3. Alongside PLASMA LAB [40], which 
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implements automatic importance sampling [33] and semi-automatic importance 
splitting [32,34] for Markov chains (with APIs allowing for extensions to other 
models), modes is one of the most automated tools for RES on formal models today. 
In particular, we are not aware of any other tool that provides fully automated RES 
on general stochastic models. 


Nondeterminism. Sound SMC for nondeterministic models is a hard problem. For 
MDP, Brázdil et al. [3] proposed a sound machine learning technique to incremen- 
tally improve a partial scheduler. UPPAAL STRATEGO [13] explicitly synthesises 
a “good” scheduler before using it for a standard SMC analysis. Both approaches 
suffer from worst-case memory usage linear in the number of states as all scheduler 
decisions must be stored explicitly. Classic memory-efficient sampling approaches 
like the one of Kearns et al. [35] address discounted models only. modes implements 
the lightweight scheduler sampling (LSS) approach introduced by Legay et al. [39]. 
It is currently the only technique that applies to reachability probabilities and 
undiscounted expected rewards—as typically considered in formal verification— 
that also keeps memory usage effectively constant in the number of states. Its effi- 
ciency depends only on the likelihood of sampling near-optimal schedulers. modes 
implements the existing LSS approaches for MDP [39] and PTA [10,26] and sup- 
ports unbounded properties on Markov automata (MA [16]). We describe modes’ 
LSS implementation in Sect. 4. 


The modes Tool. modes is part of the MODEST TOOLSET [24], which also includes 
the explicit-state model checker mcsta and the model-based tester motest [21]. 
It inherits the toolset’s support for a variety of input formalisms, including the 
high-level process algebra-based MODEST language [22] and xSADF [25], an 
extension of scenario-aware dataflow. Many other formalisms are supported via 
the JANI interchange format [6]. As simulation is easily and efficiently parallelis- 
able, modes fully exploits multi-core systems, but can also be run in a distributed 
fashion across homogeneous or heterogeneous clusters of networked systems. We 
describe the various methods implemented to make modes a correct and scalable 
statistical model checker that supports classes of models ranging from CTMC 
to stochastic hybrid automata in Sect.2. We focus on its software architecture 
in Sect. 5. Finally, Sect.6 uses three very different case studies to highlight the 
varied kinds of models and analyses that modes can handle. 


Previous Publications. modes was first described in a tool demonstration paper in 
2012 [2]. Its focus was on the use of partial order and confluence reduction-based 
techniques [27] to decide on-the-fly if the nondeterminism in a model is spurious, i.e. 
whether maximum and minimum values are the same and an implicit randomised 
scheduler can safely be used. modes was again mentioned as a part of the MODEST 
TOOLSET in 2014 [24]. Since then, modes has been completely redesigned. The 
partial order and confluence-based methods have been replaced by LSS, enabling 
the simulation of non-spurious nondeterminism; automated importance splitting 
has been implemented for rare event simulation; support for MA and a subset of 
stochastic hybrid automata (SHA [22]) has been added; and the statistical evalu- 
ation methods have been extended and improved. Concurrently, advances in the 
shared infrastructure of the MODEST TOOLSET, now at version 3, provide access to 
new modelling features and formalisms as well as support for the JANI specification. 
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2 Ingredients of a Statistical Model Checker 


A statistical model checker performs a number of tasks to analyse a given formal 
model w.r.t. to a property of interest. In this section, we describe these tasks, 
their challenges, and how modes implements them. All random selections in an 
SMC tool are typically resolved by a pseudo-random number generator (PRNG). 
For brevity, we write “random” to mean “pseudo-random” in this section. 


Simulating Different Model Types. The most basic task is simulation: the gen- 
eration of random samples—simulation runs—from the probability distribution 
over behaviours defined by the model. modes contains simulation algorithms 
specifically optimised for the following types of models: 


— For deterministic MDP (Markov decision processes), i.e. DTMC (discrete- 
time Markov chains), simulation is simple and efficient: Obtain the current 
state’s probability distribution over successors, randomly select one of them 
(using the distribution’s probabilities), and continue from that state. 

— Deterministic MA (Markov automata [16]) are CTMC. Here, the situation is 
similar: Obtain the set of enabled outgoing transitions, randomly select a delay 
from the exponential distribution parameterised by the sum of their rates, then 
make a random selection of one transition weighted by the transitions’ rates. 

— PTA (probabilistic timed automata [38]) extend MDP with clock variables, 
transition guards and location invariants as in timed automata. Like MA, 
they are a continuous-time model, but explicitly keep a memory of elapsed 
times in the clocks. They admit finite-state abstractions that preserve reacha- 
bility probabilities and allow them to essentially be simulated as MDP. modes 
implements region graph- and zone-based simulation of PTA as MDP [10, 26]. 
With fewer restrictions, they can also be treated as SHA: 

— SHA extend PTA with general continuous probability distributions and con- 
tinuous variables with dynamics governed by differential equations and inclu- 
sions. modes supports deterministic SHA where all differential equations are 
of the form ù = e for a continuous variable v and an expression e over discrete 
variables. This subset can be simulated without the need for approximations; 
it corresponds to deterministic rectangular hybrid automata [29]. For each 
transition, the SHA simulator needs to compute the set of time points at 
which it is enabled. These sets can be unions of several disjoint intervals, 
which results in relatively higher computational effort for SHA simulation. 


Properties and Termination. SMC computes a value for the property on every 
simulation run. A run is a finite trace; consequently, standard SMC only works 
for linear-time properties that can be decided on finite traces. modes supports 


— transient (reachability) queries of the form P(~avoid U goal) for the proba- 
bility of reaching a set of states characterised by the state formula goal before 
entering the set of states characterised by state formula avoid, and 

— expected reward queries of the form E(reward | goal) for the expected 
accumulated reward (or cost) over the reward structure reward when reaching 
a location in the set of states characterised by goal for the first time. 
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Transient queries may be time- and reward-bounded. A state formula is an 
expression over the (discrete and continuous) variables of the model without 
any temporal operators. A reward structure assigns a rate reward r(s) € R to 
every state s and a branch reward r(b) € R to every probabilistic branch b of 
every transition. An example transient query is “what is the probability to reach 
a destination (goal) within an energy budget (a reward bound) while avoiding 
collisions (avoid)”. Expected reward queries allow asking for e.g. the expected 
number of retransmissions (the reward) until a message is successfully transmit- 
ted (goal) in a wireless network protocol. Every query q can be turned into a 
requirement q ~ c by adding a comparison ~€ { <, >} to a constant value c € R. 

A simulation run ends when the value of a property is decided. For transient 
properties, this is the case when reaching an avoid state or a deadlock (value 0), 
or a goal state (value 1). To ensure termination, the probability of eventually 
encountering one of these events must be 1. modes additionally implements cycle 
detection: it keeps track of a configurable number n of previous visited states. 
When a run returns to a previous state without intermediate steps of probability 
<1, it will loop forever on this cycle and the run has value 0. modes uses n = 1 by 
default for good performance while still allowing models built for model checking, 
which avoid deadlocks but often contain terminal states with self-loops, to be 
simulated. For expected rewards, when entering a goal state, the property is 
decided with the value being the sum of the rewards along the run. 


Statistical Evaluation of Samples. n simulation runs provide a sequence of inde- 
pendent values v1,...,Un for the property. ôn = Da vi is an unbiased esti- 
mator of the actual probability or expected reward v. An SMC tool must stop 
generating runs at some point, and quantify the statistical properties of the 


estimate = t, returned to the user. modes implements the following methods: 


— For a given half-width w and confidence 6, the CI method returns a confidence 
interval [x,y] that contains 6, with y — x = 2-w. Its guarantee is that, if the 
SMC analysis is repeated many times, 100-6 % of the confidence intervals will 
contain v. For transient properties, where the v; are sampled from a Bernoulli 
distribution, modes constructs a binomial proportion confidence interval. For 
expected rewards, the underlying distribution is unknown, and modes uses the 
standard normal (or Gaussian) confidence interval. This relies on the central 
limit theorem for means, assuming a “large enough” n. modes requires n > 50 
as a heuristic. modes requires the user to specify 6 plus either of w and n. If n 
is not specified, the CI method becomes a sequential procedure: generate runs 
until the with of the interval for confidence 6 is below 2-w. The CI method can 
be turned into a hypothesis test for requirements q ~ c by checking whether 
ô > yor Ô < x, and returning undecided if ô is inside the interval. When n 
is unspecified, this is the Chow-Robbins sequential test [44]. Finally, modes 
can be instructed to interpret the value of w as a relative half-width, i.e. the 
final interval will have width ô- 2- w. This is useful for rare events. 

— The APMC [30] method, based on the Okamoto bound [41], guarantees for 
error € and confidence 6 that P(|s—v| > €) < ô. It only applies to the Bernoulli- 
distributed samples for transient properties here. modes requires the user to 
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specify any two of €, 6 and n, out of which the missing value can be computed. 
The APMC method can be used as a hypothesis test for P(-) ~ c by checking 
whether ô > c+ e or 0 < c — e, and returning undecided if neither is true. 

— modes also implements Wald’s SPRT, the sequential probability ratio test [47]. 
As a sequential hypothesis test, it has no predetermined n, but decides on-the- 
fly whether more samples are needed as they come in. It is a test for Bernoulli- 
distributed quantities, i.e. it only applies to transient requirements of the form 
P(-) ~ c. For indifference level € and error a, it stops when the collected samples 
so far provide sufficient evidence to decide between P(-) > c+ €or P(-) < c— e 
with probability <a of wrongly accepting either hypothesis. 


For a more detailed description of these and other statistical methods and espe- 
cially hypothesis tests for SMC, we refer the interested reader to [44]. 


Distributed Sample Generation. Simulation is easily and efficiently parallelisable. 
Yet a naive implementation of the statistical evaluation—processing the values 
from the runs in the order they flow in—risks introducing a bias in a parallel 
setting. Consider estimating the probability of system failure when simulation 
runs that encounter failure states are shorter than other runs, and thus quicker. 
In parallel simulation, failure runs will tend to arrive earlier and more frequently, 
thus overestimating the probability of failure. To avoid such bias, modes uses the 
adaptive schedule first implemented in YMER [48]. It adapts to differences in the 
speed of nodes by scheduling to process more future results from fast nodes when 
current results come in quickly. It always commits to a schedule a priori before 
the actual results arrive, ensuring the absence of bias. It is thus well-suited for 
heterogeneous clusters of machines with significant performance differences. 


3 Automated Rare Event Simulation 


With the standard confidence of 6 = 0.95, we have n ~ 0.37/e? in the APMC 
method: for every decimal digit of precision, the number of runs increases by a 
factor of 100. If we attempt to estimate probabilities on the order of 1074, ice. 
e€ ~ 107°, we need billions of runs and days or weeks of simulation time. This 
is the problem tackled by rare event simulation (RES) techniques [45]. modes 
implements RES for transient properties via importance splitting, which itera- 
tively increases the simulation effort for states “closer” to the goal set. Closeness 
is represented by an importance function fr: S — N that maps each state in § to 
its importance in {0,...,max fr }. The performance, but not the correctness, of 
all splitting methods hinges on the quality of the importance function. 


Deriving Importance Functions. Traditionally, the importance function is spec- 
ified ad hoc by a RES expert. Striving for usability by domain experts, modes 
implements the compositional importance function generation method of [5] that 
is applicable to any compositional stochastic model M = Mı ||... || Mp with a 
partly discrete state space. We write s|; for the projection of state s of M to the 
discrete local variables of component M;. The method works as follows [4]: 
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Fig. 1. Illustration of RESTART [4] Fig. 2. Illustration of fixed effort [4] 


1. Convert the goal set formula goal to negation normal form (NNF) and asso- 
ciate each literal goal’ with the component M( goal’) whose local state variables 
it refers to. Literals must not refer to multiple components. 

2. Explore the discrete part of the state space of each component M;. For each 
goal? with M; = M( goal’), use reverse breadth-first search to compute the local 
minimum distance f/(s|;) of each state s|; to a state satisfying goal’. 

3. In the syntax of the NNF of goal, replace every occurrence of goal! by ff (s|) 
with i such that M; = M(goal’), and every Boolean operator A or V by +. Use 
the resulting formula as the importance function f7(s). 


The method takes into account both the structure of the goal set formula and 
the structure of the state space. This is in contrast to the approach of Jégourel et 
al. [32], implemented in a semi-automated fashion in PLASMA LAB [34,40], that 
only considers the structure of the (more complex linear-time) logical property. 
The memory usage of the compositional method is determined by the number 
of discrete local states (required to be finite) over all components. Typically, 
component state spaces are small even when the composed state space explodes. 


Levels and Splitting Factors. We also need to specify when and how much to 
“split”, i.e. increase the simulation effort. For this purpose, the values of the 
importance function are partitioned into levels and a splitting factor is chosen for 
each level. Splitting too much too often will degrade performance (oversplitting), 
while splitting too little will cause starvation, i.e. few runs that reach the rare 
event. It is thus critical to choose good levels and splitting factors. Again, to avoid 
the user having to make these choices ad hoc, modes implements two methods 
to compute them automatically. One is based on the sequential Monte Carlo 
splitting technique [8], while the other method, named expected success [4], has 
been newly developed for modes. It strives to find levels and factors that lead to 
one run moving up from one level to the next in the expectation. 


Importance Splitting Runs. The derivation of importance function, levels and 
splitting factors is a preprocessing step. Importance splitting then replaces the 
simulation algorithm by a variant that takes this information into account to 
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more often encounter the rare event. modes implements three importance split- 
ting techniques: RESTART, fixed effort and fixed success. 

RESTART [46] is illustrated in Fig. 1: As soon as a RESTART run crosses the 
threshold into a higher level, ng—1 new child runs are started from the first state 
in the new level, where ny is the splitting factor of level 4. When a run moves 
below its creation level, it ends. It also ends on reaching an avoid or goal state. 
The result of a RESTART run—consisting of a main and several child runs—is 
the number of runs that reach goal times 1/ [|], ne, i.e. a rational number > 0. 

Runs of the fixed effort method [17,19], illustrated in Fig. 2, are rather differ- 
ent. They consist of a fixed number of partial runs on each level, each of which 
ends when it crosses into the next higher level or encounters a goal or avoid state. 
When all partial runs for a level have ended, the next round starts from the pre- 
viously encountered initial states of the next higher level. When a fixed effort run 
ends, the fraction of partial runs started in a level that moved up approximates 
the conditional probability of reaching the next level given that the current level 
was reached. If goal states exist only on the highest level, the overall result is 
the product of all of these fractions, i.e. a rational number in [0, 1]. 

Fixed success [1] is a variant of fixed effort that generates partial runs until a 
fixed number of them have reached the next higher level. For all three methods, 
the average of the result of many runs is again an unbiased estimator for the 
probability of the transient property [19]. However, each run is no longer a 
Bernoulli trial. Of the statistical evaluation methods offered by modes, only CI 
with normal confidence intervals is thus applicable. For a deeper discussion of the 
challenges in the statistical evaluation of rare event simulation results, we refer 
the interested reader to [43]. To the best of our knowledge, modes is today the 
most automated rare event simulator for general stochastic models. In particular, 
it defaults to the combination of RESTART with the expected success method 
for level calculation, which has shown consistently good performance [4]. 


4 Scheduler Sampling for Nondeterminism 


Resolving nondeterminism in a randomised way leads to estimates that only lie 
somewhere between the desired extremal values. In addition to computing prob- 
abilities or expected rewards, we also need to find a (near-)optimal scheduler. 


Lightweight Scheduler Sampling. modes implements the lightweight scheduler 
sampling (LSS) approach for MDP of [39] that identifies a scheduler by a sin- 
gle integer (typically of 32 bits). This allows to randomly select a large number 
m of schedulers (i.e. integers), perform standard or rare event simulation for 
each, and report the maximum and minimum estimates over all sampled sched- 
ulers as approximations of the actual extremal values. We show the core of the 
lightweight approach—performing a simulation run for a given scheduler identi- 
fier o—for MDP and transient properties as Algorithm 1. An MDP consists of 
a countable set of states S, a transition function T that maps each state to a 
finite set of probability distributions over successor states, and an initial state 
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Input: MDP (S,T, so), transient property ¢, scheduler id o € Z 


1 S:= S0, T := So 

2 while ¢(7) = undecided do 

3 Una.initialise(H (0.s)) // use hash of o and s as seed for Una 
4 if T(s) = Ø then return false // end of run due to deadlock 
5 u := [Una - |T'(s)|]-th element of T(s) // use Una to select transition 
6 s’ := u o Upy.next() // use Upr to select successor state according to u 
7 T= 7.8’, s:= 38" // append s' to m and continue from s' 
8 return ¢(7) 


Algorithm 1. Simulation for an MDP and a fixed scheduler id [10] 


so. The algorithm uses two PRNG: Up, to simulate the probabilistic choices (line 
6), and Una to resolve the nondeterministic ones (line 5). We want o to represent 
a deterministic memoryless scheduler: within one simulation run as well as in 
different runs for the same value of o, Una must always make the same choice for 
the same state s. To achieve this, Una is re-initialised with a seed based on o and 
s in every step (line 3). The overall effectiveness of the lightweight approach only 
depends on the likelihood of selecting a o that represents a (near-)optimal sched- 
uler. We want to sample “uniformly” from the space of all schedulers to avoid 
actively biasing against “good” schedulers. Algorithm 1 achieves this naturally 
for MDP. 


Beyond MDP. LSS can be adapted to any model and type of property where 
the class of optimal schedulers only uses discrete input to make its decision 
for every state [26]. This is obviously the case for discrete-space discrete-time 
models like MDP. It means that LSS can directly be applied to MA and time- 
unbounded properties, too. In addition to MDP and MA, modes also supports 
two LSS methods for PTA, based on a variant of forwards reachability with 
zones [10] and the region graph abstraction [26], respectively. While the former 
includes zone operations with worst-case runtime exponential in the number of 
clocks, the latter implements all operations in linear time. It exploits a novel data 
structure for regions based on representative valuations that performs very well 
in practice [26]. Extending LSS to models with general continuous probability 
distributions such as stochastic automata [11] is hindered by optimal sched- 
ulers requiring non-discrete information (the values and expiration times of all 
clocks [9]). modes currently provides prototypical LSS support for SA encoded in 
a particular form and various restricted classes of schedulers as described in [9]. 


Bounds and Error Accumulation. The results of an SMC analysis with LSS are 
lower bounds for maximum and upper bounds for minimum values up to the 
specified statistical error and confidence. They can thus be used to e.g. disprove 
safety (the maximum probability to reach an unsafe state is above a threshold) 
or prove schedulability (there is a scheduler that makes it likely to complete 
the workload in time), but not the opposite. The accumulation of statistical 
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error introduced by the repeated simulation experiments over multiple sched- 
ulers must also be accounted for. [12] shows how to modify the APMC method 
accordingly and turn the SPRT into a correct sequential test over schedulers. In 
addition to these, modes allows the CI method to be used with LSS by applying 
the standard Siddk correction for multiple comparisons. This enables LSS for 
expected rewards and RES. All the adjustments essentially increase the required 
confidence depending on the (maximum) number of schedulers to be sampled. 


Two-Phase and Smart Sampling. If an SMC analysis for fixed statistical param- 
eters would need n runs on a deterministic model, it will need significantly more 
than m -n runs for a nondeterministic model when m schedulers are sampled 
due to the increase in the required confidence. modes implements a two-phase 
approach and smart sampling [12] to reduce this overhead. The former’s first 
phase consists of performing n simulation runs for each of the m schedulers. The 
scheduler that resulted in the maximum (or minimum) value is selected, and 
independently evaluated once more with n runs to produce the final estimate. 
The first phase is a heuristic to find a near-optimal scheduler before the second 
phase estimates the value under this scheduler according to the required statis- 
tical parameters. Smart sampling generalises this principle to multiple phases, 
dropping only the “worst” half of the evaluated schedulers between phases. It 
starts with an informed guess of good initial values for n and m. For details, 
see [12]. Smart sampling tends to find more extremal schedulers faster while the 
two-phase approach has predictable performance as it always needs (m + 1)-n 
runs. We thus use the two-phase approach for our experiments in Sect. 6. 


5 Architecture and Implementation 


modes is implemented in C# and works on Linux, Mac OS X and Windows sys- 
tems. It builds on a solid foundation of shared infrastructure with other tools of 
the MODEST TOOLSET. This includes input language parsers that map MODEST, 
xSADF and JANI input into a common internal metamodel for networks of 
stochastic hybrid automata with rewards and discrete variables. Before simu- 
lation, every model is compiled to bytecode, making the metamodel executable. 
The same compilation engine is also used by the mcsta and motest tools. 

The architecture of the SMC-specific part of modes is shown as a class dia- 
gram in Fig. 3. Boxes represent classes, with rounded rectangles for abstract 
classes and invisible boxes for interfaces. Solid lines are inheritance relations. 
Dotted lines are associations, with double arrows for collection associations. The 
architecture mirrors the three distinct tasks of a statistical model checker: the 
generation of individual simulation runs and per-run evaluation of properties, 
implemented in modes by RunGenerator and RunEvaluator, respectively; the 
coordination of simulation over multiple threads across CPU cores and networked 
machines, implemented by classes derived from Worker and IWorkerHost; and 
the statistical evaluation of simulation runs, implemented by PropertyEvaluator. 

The central component of modes’ architecture is the Master. It compiles the 
model, derives the importance function, sends both to the workers (on the same 
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Fig. 3. The software architecture of the modes statistical model checker 


or different machines), and instantiates a PropertiesJob for every partition of the 
properties to be analysed that can share simulation runs Each PropertiesJob 
then posts simulation jobs back to the master in parallel or in sequence. A simu- 
lation job is a description of how to generate and evaluate runs: which run type 
(i.e. RunGenerator derived class) to use, whether to wrap it in an importance 


1 Using the same set of runs for multiple properties is an optimisation at the cost of 
statistical independence. modes can also generate independent runs for each property. 
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splitting method, whether to simulate for a specific scheduler id, which compiled 
expressions to evaluate to determine termination and the values of the runs, etc. 
The master allocates posted jobs to available simulation threads offered by the 
workers, and notifies workers when a job is scheduled for one of their threads. 
As the result for an individual run is handed from the RunEvaluator by the 
RunGenerator via the workers to the master, it is fed into a Sequentialiser that 
implements the adaptive schedule for bias avoidance. Only after that, possibly 
at a later point, is it handed on to the PropertiesJob for statistical evaluation. 

For illustration, consider a PropertiesJob for LSS with 10 schedulers, RES 
with RESTART, and the expected success method for level calculation. It is given 
the importance function by the master, and its first task is to compute the levels. 
It posts a simulation job for fixed effort runs with level information collection to 
the master. Depending on the current workload from other PropertiesJobs, the 
master will allocate many threads to this job. Once enough results have come 
in, the PropertiesJob terminates the simulation job, computes the levels and 
splitting factors, and starts with the actual simulations: It selects 10 random 
scheduler identifiers and concurrently posts for each of them a simulation job for 
RESTART runs. The master will try to allocate available threads evenly over these 
jobs. As results come in, the evaluation may finish early for some schedulers, at 
which point the master will be instructed to stop the corresponding simulation 
job. It can then allocate the newly free threads to other jobs. This scheme results 
in a maximal exploitation of the available parallelism across workers and threads. 

Due to the modularity of this architecture, it is easy to extend modes in 
different ways. For example, to support a new type of model (say, non-linear 
hybrid automata) or a new RES method, only a new (I)RunGenerator needs to 
be implemented. Adding another statistical evaluation method from [44] means 
adding a new PropertyEvaluator, and so on. 

In distributed simulation, an instance of modes is started on each node with 
the --server parameter. This results in the creation of an instance of the Server 
class instead of a Master, which listens for incoming connections. Once all servers 
are running, a master can be started with a list of hosts to connect to. modes 
comes with a template script to automate this task on SLURM-based clusters. 


6 Experiments 


We present three case studies in this section. They have been chosen to highlight 
modes’ capabilities in terms of the diverse types of models it supports, its ability 
to distribute work across compute clusters, and the new analyses possible with 
RES and LSS. None of them has been studied before with modes or the combi- 
nations of methods that we apply here. Our experiments ran on an Intel Core 
i7-4790 workstation (3.6-4.0 GHz, 4 cores), a homogeneous cluster of 40 AMD 
Opteron 4386 nodes (3.1-3.8 GHz, 8 cores), and an inhomogeneous cluster of 15 
nodes with different Intel Xeon processors. All systems run 64-bit Linux. We use 
1, 2 or 4 simulation threads on the workstation (denoted “1”, “2” and “4” in 
our tables), and n nodes with t simulation threads each on the clusters (denoted 
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Table 1. Performance and scalability on the electric vehicle charging case study 


Nfail = 2 Nfail = 3 Nail = 4 Nfail = 5 
MC RES MC RES MC RES MC RES 
conf. interval | [6.46-2,7.85-2] [5.26-3,646-3] [2.75 -4,3.28-4] [8.35-6, 1.0n-5] 
1 2s 4s 30s 19s 585s 206 s 
2 1s 2s 15s 11s 315s 101s = 
4 1s Is 8s 5s 163s 69s 
5x4 Is 1s 4s 4s 69s 23s 2241s 496s 
5x8 1s 2s 2s 3s 40s 16s 1238s 328s 
40 x 2 Os Is 1s 2s 16s 8s 483s 135s 
20 x 8 Os 2s 1s 2s 10s 6s 314s 105s 
40 x 8 Os 2s ls 3s 5s 6s 159s 64s 
“n x t”). We used a one-hour timeout, marked “—” in the tables. Note that 


runtimes cannot directly be compared between the workstation and the clusters. 


Electric Vehicle Charging. We first consider a model of an electric vehicle charg- 
ing station. It is a MODEST model adapted from the “extended” case study 
of [42]: a stochastic hybrid Petri net with general transitions, which in turn is 
based on the work in [31]. The scenario we model is of an electric vehicle being 
connected to the charger every evening in order to be recharged the next morn- 
ing. The charging process may be delayed due to high load on the power grid, and 
the exact time at which the vehicle is needed in the morning follows a normal 
distribution. We consider one week of operation and compute the probability 
that the desired level of charge is not reached on any nya € {2,...,5} of the 
seven mornings. 

This model is not amenable to exhaustive model checking due to the non- 
Markovian continuous probability distributions and the hybrid dynamics mod- 
elling the charging process. However, it is deterministic. We thus applied modes 
with standard Monte Carlo simulation (MC) as well as with RES using RESTART. 
We performed the same analysis on different configurations of the workstation 
and the homogeneous cluster. To compare MC and RES, we use CI with 6 = 0.95 
and a relative half-with of 10% for both. All other parameters of modes are set 
to default values, which implies an automatic compositional importance function 
and the expected success method to determine levels and splitting factors. The 
results are shown in Table 1. Row “conf. interval” gives the average confidence 
intervals that we obtained over all experiments. 

RES starts to noticeably pay off as soon as probabilities are on the order of 
1074. The runtime of RESTART is known to heavily depend on the levels and 
splitting factors, and we indeed noticed large variations in runtime for RES over 
several repetitions of the experiments. The runtimes for RES should thus not be 
used to judge the speedup w.r.t. parallelisation. However, when looking at the 
MC runtimes, we see good speedups as we increase the number of threads per 
node, and near-ideal speedups as we increase the total number of nodes, as long 
as there is a sufficient amount of work. 
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Table 2. Performance and results for the low-latency wireless network case study 


time P(i< 4U failed) P(i < 4U offline,,,) P(i< 4U offlinesy)) 


optimal (0.028, 0.472] [0.026, 0.269] 0 , 0.424] 
1 100 | 3523s 
2 100] 2045s — 0.041, 0.363] (0.030, 0.189] (0.000, 0.309] 


4 100 | 1205s 
20 x 8 1000 | 607s 
40 x 8 1000 | 308s 


(0.033, 0.383] (0.028, 0.242] (0.000, 0.327] 


Although this model was not designed with RES in mind and has only mod- 
erately rare events, the fully automated methods of modes could be applied 
directly, and they significantly improved performance. For a detailed experimen- 
tal comparison of the RES methods implemented in modes on a larger set of 
examples, including events with probabilities as low as 4.8 - 10~?3, we refer the 
reader to [4]. 


Low-latency Wireless Networks. We now turn to the PTA model of a low-latency 
wireless networking protocol being used among three stations, originally pre- 
sented in [15]. We take the original model, increase the probability of message 
loss, and make one of the communication links nondeterministically drop mes- 
sages. This allows us to study the influence of the message loss probabilities and 
the protocol’s robustness to adversarial interference. The model is amenable to 
model checking, as demonstrated in [15]. It allows us to show that modes can 
be applied to such models originally built for traditional verification, and since 
we can calculate the precise maximum and minimum values of all properties via 
model checking, we have a reference to evaluate the results of LSS. 

We show the results of using modes with LSS on this model in Table 2. Row 
“optimal” lists the maximum and minimum probabilities computed via model 
checking for three properties: the probability that the protocol fails within four 
iterations, and that either the first or the second station goes offline. We used 
the two-phase LSS method with m = 100 schedulers on the workstation, and 
with m = 1000 schedulers on the homogeneous cluster. The intervals are the 
averages of the min. and max. values returned by all analyses. The statistical 
evaluation is APMC with 6 = 0.95 and e = 0.0025, which means that 59556 
simulation runs are needed per scheduler. 

Near-optimal schedulers for the minimum probabilities do not appear to be 
rare: we find good bounds for the minima even with 100 schedulers. However, 
for maximum probabilities, sampling more schedulers pays off in terms of better 
approximations. In all cases, the results are conservative approximations of the 
actual optima (as expected), and they are clearly more useful than the single 
value that would be obtained by other tools via a (hidden) randomised scheduler. 
Performance scales ideally with parallelism on the cluster, and still linearly on 
the workstation. For a deeper evaluation of the characteristics of LSS, including 
experiments on models too large for model checking, we refer the reader to the 
description of the original approach [12,39] and its extensions to PTA [10,26]. 
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Table 3. Performance and results for the reliable database system case study 


uniform scheduler lightweight scheduler sampling (20) 
R| MC RES conf. interval MC RES min. conf. int. max. conf. int. 
2| Is 4s [1.55-2,185-2)| 4s 31s [4E 2, 1.7m 2] [L5e 2, 1.9 2] 
3 8s 3s [1.0E-4,1.3E-4] | 181s 26s [7.9E-5,9.6E-5] [1.3E—4, 1.6E—4] 
4 | 8l6s 13s [9.3E-7,1.1E-6] 221s [6.3E-7,7.6£-7] [1.3E-6, 1.6E-6] 
5 | — 229s [1.1E-8, 1.3E-8] 3072s [6.26-9,7.6E-9] [1.6B-8, 2.0E-8] 


Redundant Database System. The redundant database system [20] is a classic 
RES case study. It models a system consisting of six disk clusters of R + 2 
disks each plus two types of processors and disk controllers with R copies of 
each type. Component lifetimes are exponentially distributed. Components fail 
in one of two modes with equal probability, each mode having a different repair 
rate. The system is operational as long as fewer than R processors of each type, 
R controllers of each type, and R disks in each cluster are currently failed. The 
model is a CTMC with a state space too large and a transition matrix too dense 
for it to be amenable to model checking with symbolic tools like PRISM [37]. 

In the original model, any number of failed components can be repaired in 
parallel. We consider this unrealistic, and extend the model by a repairman 
that can repair a single component at a time. If more than one component fails 
during a repair, then as soon as the current repair is finished, the repairman has 
to decide which to repair next. Instead of enforcing a particular repair policy, we 
leave this decision as nondeterministic. The model thus becomes an MA. We use 
LSS in combination with RES to investigate the impact of the repair policy. We 
study the scenario where one component of each kind (one disk, one processor, 
one controller) is in failed state, and estimate the probability for system failure 
before these components are repaired. The minimum probability is achieved by a 
perfect repair strategy, while the maximum results from the worst possible one. 

Table3 shows the results of our LSS-plus-RES analysis with modes using 
default RES parameters and sampling m = 20 schedulers. Due to the complexity 
of the model, we ran this experiment on the inhomogeneous cluster only, using 16 
cores on each node for 240 concurrent simulation threads in total. We see that 
RES needs a somewhat rare event to improve performance. We also compare 
LSS to the uniform randomised scheduler (as implemented in many other SMC 
tools). It results in a single confidence interval for the probability of failure. 
With LSS, we get two intervals. They do not overlap when R > 3, i.e. the repair 
strategy matters: a bad strategy makes failure approximately twice as likely as a 
good strategy! Since the results of LSS are conservative, the difference between 
the worst and the best strategy may be even larger. 


Experiment Replication. To enable independent replication of our experimental 
results, we have created a publicly available evaluation artifact [23]. It contains 
the version of modes and the model files used for our experiments, the raw 
experimental results, summarising tabular views of those results (from which we 
derived Tables 1, 2 and 3), and a Linux shell script to automatically replicate a 
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subset of the experiments. Since the complete experiments take several hours to 
complete and require powerful hardware and computer clusters, we have selected 
a subset for the replication script. Using the virtual machine of the TACAS 
2018 Artifact Evaluation [28] on typical workstation hardware of 2017, it runs 
to completion in less than one hour while still substantiating our main results. 


7 Conclusion 


We presented modes, the MODEST TOOLSET’s distributed statistical model 
checker. It provides methods to tackle both of the prominent challenges in simu- 
lation: nondeterminism and rare events. Its modular software architecture allows 
its various features to be easily combined and extended. For the first time, we 
used lightweight scheduler sampling with Markov automata, and combined it 
with rare event simulation to gain insights into a challenging case study that, 
currently, cannot be analysed for the same aspects with any other tool that we 
are aware of. modes is available for download at www.modestchecker.net. 
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Abstract. We introduce a natural notion of limit-deterministic par- 
ity automata and present a method that uses such automata to con- 
struct satisfiability games for the weakly aconjunctive fragment of 
the p-calculus. To this end we devise a method that determinizes 
limit-deterministic parity automata of size n with k priorities through 
limit-deterministic Biichi automata to deterministic parity automata 
of size O((nk)!) and with O(nk) priorities. The construction relies on 
limit-determinism to avoid the full complexity of the Safra/Piterman- 
construction by using partial permutations of states in place of Safra- 
Trees. By showing that limit-deterministic parity automata can be 
used to recognize unsuccessful branches in pre-tableaux for the weakly 
aconjunctive p-calculus, we obtain satisfiability games of size O((nk)!) 
with O(nk) priorities for weakly aconjunctive input formulas of size n 
and alternation-depth k. A prototypical implementation that employs a 
tableau-based global caching algorithm to solve these games on-the-fly 
shows promising initial results. 


1 Introduction 


The modal p-calculus [15] is an expressive logic for reasoning about concur- 
rent systems. Its satisfiability problem is ExPTIME-complete [5]. Due to nesting 
of fixpoints, the semantic structure of the -calculus is quite involved, which 
is reflected in the high degree of sophistication of reasoning algorithms for 
the p-calculus. One convenient modular approach is the definition of suitable 
satisfiability games (e.g. [10]); solving such games (i.e. computing their win- 
ning regions) then amounts to deciding the satisfiability of the input formulas. 
A standard method for obtaining satisfiability games is to first construct a track- 
ing automaton that accepts the bad branches in a pre-tableau for the input for- 
mula, i.e. those that infinitely defer satisfaction of a least fixpoint; this automaton 
then is determinized and complemented, and the satisfiability game is built over 
the carrier set of the resulting automaton. The moves in the game are those 
transitions from the automaton that correspond to applications of tableau-rules; 
the existence of a winning strategy in this game ensures the existence of a model, 
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i.e. a locally coherent structure that does not contain bad branches. As they typ- 
ically incur exponential blowup, good determinization procedures for automata 
on infinite words play a crucial role in standard decision procedures for the 
satisfiability problem of the -calculus and its fragments; in particular, better 
determinization procedures lead to smaller satisfiability games which are easier 
to solve. 

The weakly aconjunctive -calculus [15,24] restricts occurrences of recursion 
variables in conjunctions but is still quite expressive, e.g. can define winning 
regions in parity games with bounded number of priorities [4]. The key observa- 
tion for the present paper is that in the weakly aconjunctive case, pre-tableau 
branches are made ‘bad’ by a single formula; this implies that the tracking 
automaton for such formulas is limit-deterministic, i.e. that it is sufficient to 
deterministically track a single formula from some point on. This motivates a 
notion of limit-deterministic parity automata in which all accepting runs are 
deterministic from some point on. Because the nondeterminism is restricted to 
finite prefixes of accepting runs in such automata, they can be determinized in 
a simpler way than unrestricted parity automata. We present a reformulation 
of a recent determinization method for limit-deterministic Btichi automata [6]. 
The method is inspired by, but significantly less involved than the more general 
Safra/Piterman construction [19,20], essentially due to the fact that the tree 
structure of Safra trees collapses, leaving only the permutation structure. The 
resulting parity automaton can thus be described as a permutation automaton. 
The method yields deterministic parity automata with O(n!) states, compared to 
O((n!)?) in the Safra/Piterman construction. Crucially, we show that we obtain 
a similarly simplified determinization for limit-deterministic parity automata by 
translating into Biichi automata. 

As indicated above, limit-deterministic parity automata are able to recog- 
nize bad branches in pre-tableaux for weakly aconjunctive p-calculus formulas. 
Employing them in the standard construction of satisfiability games, we obtain 
permutation games in which nodes from the pre-tableau are annotated with a 
partial permutation (i.e. a non-repetitive list) of (levelled) formulas. A parity 
condition is used to detect indices in the permutation that are active infinitely 
often without ever being removed from the permutation. The resulting parity 
games are of size O((nk)!) and have O(nk) priorities; as a side result, we thus 
obtain a new bound O((nk)!) on model size for weakly aconjunctive formulas. 

The resulting decision procedure generalizes to the weakly aconjunctive 
coalgebraic p-calculus, thus covering also, e.g., probabilistic and alternating- 
time versions of the p-calculus. The generic algorithm has been implemented 
as an extension of the Coalgebraic Ontology Logic Reasoner (COOL) [11,13]. 
Our implementation constructs and solves the presented permutation games on- 
the-fly, possibly finishing satisfiability proofs early, and shows promising initial 
results. The content of the paper is structured as follows: We describe the deter- 
minization of limit-deterministic automata in Sect.2 and the construction of 
permutation games in Sect.3, and discuss implementation and evaluation in 
Sect. 4. 
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Related Work. Liu and Wang [17] give a tighter estimate O((n!)?) for the num- 
ber of states in Piterman’s determinization [19]. Schewe [21] simplifies Piterman’s 
construction (establishing the same bound as Liu and Wang). Tian and Duan [23] 
further improve Schewe’s construction. Fisman and Lustig [7] present a modular- 
ization of Biichi determinization that is aimed mainly at easing understanding 
of the construction. Parity automata can be determinized by first converting 
them to Btchi automata and then applying Biichi determinization. Schewe and 
Varghese [22] address the direct determinization of parity automata (via Rabin 
automata), and prove optimality within a small constant factor, and even abso- 
lute optimality for the Biichi subcase. All these constructions and estimates 
concern unrestricted Büchi or parity automata. Recently, Safra-less determiniza- 
tion of limit-deterministic Büchi automata has been described in the context of 
controller synthesis for LTL [6]; the determinization method that we present 
in Sect. 2.2. has been devised independently from [6] but employs a very simi- 
lar construction (yielding essentially the same results on the complexity of the 
construction). 

The use of games in p-calculus satisfiability checking goes back to Niwiński 
and Walukiewicz [18] and has since been extended to the unguarded p- 
calculus [10] and the coalgebraic p-calculus [2]. Game-based procedures for 
the relational p-calculus have been implemented in MLSolver [9], and for the 
alternation-free coalgebraic p-calculus in COOL [13]. 


2 Determinizing Limit-Deterministic Automata 


2.1 Limit-Deterministic Automata 


We recall the basics of parity automata: A parity automaton is a tuple A = 
(V, X, ô, uo, œ) where V is a set of states, X is an alphabet, ô C Vx X xV isa tran- 
sition relation, uo € V is an initial state, and a: 6 — Nisa priority function that 
assigns natural numbers to transitions (assigning priorities to transitions rather 
than states yields a slightly more succinct notion of automata while retaining the 
computational properties of standard parity automata [22]). For (v,a) € V x X, 
we write d(v,a) = {u | (v,a,u) € ô}. The index idx(A) = max{a(t) | t € ô} 
of a parity automaton A is its maximal priority. A run p = vov,... of A on 
an infinite word w = aga,... € X® starting at v € V is a (possibly infinite) 
sequence of states v; such that vo = v and for all i > 0, vi+ı € (vi, ai). We see 
runs p or words w as functions from natural numbers to states p(i) = v; € V or 
letters w(i) = a; € X. For a run p ona word w, we define the according sequence 
trans(p) of transitions by trans(p)(7) = (p(t), w(i), pli +1)). We denote the set of 
all runs of A on a word w starting at v by run(A,v,w), or just by run(A, w) if 
v = uo. Arun p of A ona word w is accepting if the highest priority that occurs 
infinitely often in it (notation: max(Inf(a@ o trans(p))); we generally write Inf(s) 
for the set of elements occurring infinitely often in a sequence s) is even. A parity 
automaton A accepts an infinite word w if run( A, w) contains an accepting run, 
and we denote by L(A) C ©” the set of all words that are accepted by A. 
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Given a state v € V and a letter a € X, we define |ua = {(v,a,u) | 
u € 6(v,a)}. Given a set y C 6 of transitions, a state v € V, a set of states 
U CV and a letter a € X, we put y(U,a) = Uf{y(v, a) | v € U}; given a finite 
word w = ao...@n, we then recursively define y(v,w) = y(y(v, ao), a1.. - an), 
obtaining the set of all states reachable from v when reading w while only using 
transitions from y. For U C V, y C ô and w € X*, we put 7(U,w) = U{y(u, w) | 
u € U}. Furthermore, we define the set of states that are reachable from a node 
v € V using transitions from y as reach, (v) = U{y(v, w) | w € X*}; we extend 
this notation to sets of nodes, putting reach, (U) = |J{reach, (u) | u € U} for 
U C V. If y = 6, then we omit the subscripts. A state v € V is said to be 
deterministic (in y C ô) if it has at most one (y-)successor for each letter a € X. 
A set U C V is deterministic (in y C ô) if every state v € U is deterministic 
(in y). The automaton A is said to be deterministic if V is deterministic; the 
transition relation in deterministic automata hence is a partial function (since 
such automata can be transformed to equivalent automata with total transition 
function, this definition suffices for purposes of determinization). We put a(i) = 
{t € ô | a(t) = i} and a<(i) = {t € ô | a(t) < i}. 

A Büchi automaton is a parity automaton with only the priorities 1 and 2; 
the set of accepting transitions then is F = a(2) and a run is accepting if it 
passes infinitely many accepting transitions. For Büchi automata, we assume 
w.l.o.g. that every transition t € F is part of a cycle. We use the abbreviations 
(N/D)PA, (N/D)BA to denote the different types of automata. 

Our notion of limit-determinism of automata is defined as a semantic prop- 
erty: 


Definition 1 (Limit-deterministic parity automata). A PA A = 
(V, 37, 6, uo, a) is limit-deterministic if there is, for each word w and each accept- 
ing run p € run(A,w), a number 7 such that for all j > i, d|p(7) wij) N a<) = 
{trans(p)(j)}, where L = max(Inf(a o trans(p))). 


If A is a BA, then we have max(Inf(a o trans(p))) = 2 for every accepting run 
p; as a<(2) = ô, the above definition instantiates to requiring the existence of a 
number i such that for all j > i, 6(p(7), w(y)) = {pel + 1)}. 


Definition 2 (Compartments). Given a PA A = (V, 5,0, uo, a) with k pri- 
orities, and an even number | < k, the l-compartment C;(t) of a transition 
t € a(l) is the set reacha—(y)(73(t)) where 73 projects transitions t = (v,a, u) to 
their target nodes u. If l is irrelevant, then we refer to l-compartments just as 
compartments. The size of a compartment C is just |C|. A compartment C is 
internally deterministic if for each v € C and all a € X, |d(v,a) N C| < 1. 


Note that the union of all /-compartments is reach, _ (1) (773[a(/)]). Compartments 
allow for a syntactic characterization of limit-determinism: 


Lemma 3. A PA is limit-deterministic if and only if all its compartments are 
internally deterministic. 
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Corollary 4. It is decidable in polynomial time whether a given automaton is 
lumit-deterministic. 


Lemma 3 specializes to BA as follows: we have a(0) = 0, a<(2) = 6 and 
a(2) = F, so that the union of all 0-compartments is empty and that of all 
2-compartments is reach(73[F']); thus a BA is limit-deterministic if and only 
if reach(73[F]) is deterministic. Such Biichi automata are also called semi- 
deterministic [3]. 


2.2 Determinizing Limit-Deterministic Biichi Automata 


The Safra/Piterman construction [19,20] determinizes Biichi automata by means 
of so-called Safra trees, i.e. trees whose nodes are labelled with sets of states of 
the input automaton such that the label of a node is a proper superset of the 
union of all its children’s labels. Additionally, the nodes are ordered by their 
age and upon each transition between Safra trees, the ages of the oldest nodes 
that are active and/or removed during this transition determine the priority of 
the new Safra tree. In its original formulation, the Safra/Piterman construction 
adds new child nodes to the graph that are labelled with the accepting states in 
their parent’s label. We observe that this step can be modified slightly — with- 
out affecting the correctness of the construction — by letting every accepting 
state from the parent’s label receive its own separate child node; then the labels 
of newly created nodes are always singletons. Limit-determinism of the input 
automaton then implies that the node labels also remain singletons. Since sin- 
gleton nodes do not have children in Safra trees, this leads to the collapse of their 
tree structure; the resulting data structure is essentially a partial permutation, 
i.e. a non-repetitive list, of states (ordered by their age). The arising modified 
Safra/Piterman construction for the limit-deterministic case boils down to the 
following method, which (a) has a relatively short presentation and a simpler 
correctness proof than the full Safra/Piterman construction, and (b) results in 
asymptotically smaller automata; the underlying idea of the construction has 
first been described in the context of controller synthesis for LTL [6]. 


Definition 5 (Partial permutations). Given a set U of states, let pperm(U) 
denote the set of partial permutations over U, i.e. the set of non-repetitive lists 
l= |vi,...,Un] with v; Æ v; for i A j and v; € U, for all 1 <i < n. We denote 
the i-th element in l by l(i) = v;, the empty partial permutation by |] and the 
length of a partial permutation l by |I|. 


Definition 6 (Determinization of limit-deterministic BA). Fix a limit- 
deterministic BA A = (V, X, ô, uo, F), and put Q = reach(73[F]), Q = V \ Q, 
q = |Q]. Define the DPA B = (W, X, 6’, wo, a) by putting W = P(Q) x pperm(Q), 
wo = ({uo},[]) if uo € Q, wo = (0, [uo]) if uo € Q and for g = (U,l) € W and 
a € X, 6'(g,a) = h, where h = (8(U,a) N Q, l’) and where l’ is constructed from 


l= [v1,...,Um] as follows: 


1. Define a list t of length m over Q U {x} (with x representing undefinedness) 
in which t(i) = w if 6(v;,a) = {w}, and t(i) = x if 6(uj,a) = 0. 
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2. For j < k and t(j) =t(k), put t(k) = x. 

3. Remove undefined entries in t, formally: for each 1 < i < |t|, if t(i) = *, then 
iteratively put t(j) = t(j + 1) for each i < j < |t|, starting at i. 

4. For any w € 6(U,a) N Q that does not occur in t, add w to the end of t. If 
there are several such w, the order in which they are added to t is irrelevant. 

5. Put l =t. 


Temporarily, t may contain duplicate or undefined entries, but Steps 2. and 3. 
ensure that in the end, t is a partial permutation of length at most q. Let r (for 
‘removed’) denote the lowest index i such that t(i) = * after Step 2. Let a (for 
‘active’) denote the lowest index į such that (l(i) a,l (i)) € F. If r > |V| and 
there is no i with (I(t), a,l’(z)) € F, then put a(g,a,h) = 1. Otherwise, put 


2q-r)+3 ifr<a 
2(q—a)+2 ifr>a. 


a(g,a,h) = 


Theorem 7. We have L(A) = L(B), and B has at most 2n +1 priorities; for 
n > 4, we have |W] < n!e. 


Corollary 8. Limit-deterministic Büchi automata of size n can be determinized 
to deterministic parity automata of size O(n!) and with O(n) priorities. 


Example 9. Consider the limit-deterministic BA A depicted below and the 
determinized DPA B that is constructed from it by applying the method. We 
see by Lemma 3 that A is really limit-deterministic: we have F = {(1,},3)}, ie. 
the b-transition from state 1 to state 3 (depicted with a boxed transition label) 
is the only accepting transition; thus we have Q = reach(73[F]) = {1,3} (so 
Q = {0,2}), and the states 1 and 3 are deterministic. Moreover, L(A) = L(B) = 
a(a|b)*(atb)”. 


Notice that in 6, there is a b-transition with priority 1 from the initial state to 
the sink state (0,[]) and an a-transition to ({0,2},[1]); as 1 € Q but 1 ¢ F, 
this transition has priority 1. A further b-transition leads from 1 to 3 in A; in B, 
we have a b-transition from ({0, 2}, [1]) to ({2}, [3]) and since (1,b,3) € F, the 
first position in the permutation component is active during this transition so 
that the transition has priority 4. Yet another b-transition loops from ({2}, [3]) 
to ({2}, [3]). Since there is no b-transition starting at state 3, the first element 
in the permutation is removed in Step 1. of the construction. Since there is a 
b-transition from 2 to 3, it is added to the permutation again in Step 4. of the 
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construction. Crucially, however, the priority of the transition is 5, since the 
first item of the permutation has been (temporarily) removed. The intuition is 
that the trace of 3 ends when the letter b is read; even though a new trace of 3 
immediately starts, we do not consider it to be the same trace as the previous 
one. Thus the transition obtains priority 5 so that it may be used only finitely 
often in an accepting run of B, i.e. accepting runs contain an uninterrupted 
trace that visits state 3 infinitely often. Thus two or more consecutive b’s can 
only occur finitely often in any accepted word. 


2.3 Determinizing Limit-Deterministic Parity Automata 


To determinize limit-deterministic PA, it suffices to transform them to equivalent 
limit-deterministic BA and determinize the BA. This transformation from PA to 
BA is achieved by a construction which is inspired by Theorems 2 and 3 in [14]; 
we add the observation that the construction preserves limit-determinism. 


Definition 10. Given a limit-deterministic PA C = (V, X, ô, uo, a) with n = |V] 
and k > 2 priorities, we define the limit-deterministic BA D = (W, X, 0’, uo, F) 
by putting W = VU (V x {0,..., (+1), and for w € W anda E X, 


iga foe | (v,a,w) € a(2m)}Ud(v,a) ifveEeV 
, {(w,l) | (v,a, w) € a<(21)} ifv=(v DEV 
Finally, we put F = {((v,1),a,(w,l)) € 6’ | a(v,a,w) = 21}. To see that D is 
limit-deterministic, it suffices by Lemma 3 to show that reach(73[F']) is determin- 
istic. We observe that for each state (w,l) € reach(73[F']), (w, l) is deterministic 
by definition of 6’ since w is contained in a (by Lemma 3, internally determin- 
istic) 2/-compartment of C. 


Lemma 11. We have L(C) = L(D) and |W| < n([E] +1) < nk. 


By Theorem 7, D can be determinized to a DPA € of size at most (nk)!e, with 
at most nk + 2 priorities and with L(D) = L(€). 


Corollary 12. Limit-deterministic parity automata of size n with k priorities 
can be determinized to deterministic parity automata of size O((nk)!) and with 
O(nk) priorities. 


3 Permutation Games for the Aconjunctive p-Calculus 


3.1 The p-Calculus 


We briefly recall the definition of the p-calculus. We fix a set P of propositions, 
a set A of actions, and a set Y of fixpoint variables. The set L, of p-calculus 
formulas is the set of all formulas ¢,~ that can be constructed by the grammar 


~o2=L|T |p| |X| vagel ve (a) | [aly | uX. 4 | vX. y 
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where p € P, a € A, and X € Y; we write || for the size of a formula w. 
Throughout the paper, we use 7 to denote one of the fixpoint operators u or 
v. We refer to formulas of the form 7X. as fixpoint literals, to formulas of 
the form (a)w or [a]¢ as modal literals, and to p, ~p as propositional literals. 
The operators u and v bind their variables, inducing a standard notion of free 
variables in formulas. We denote the set of free variables of a formula w by 
FV(w). A formula w is closed if FV(wW) = 9, and open otherwise. We write 
w < ġo (wv < @) to indicate that ~ is a (proper) subformula of ¢. We say that 
@ occurs free in wW if @ occurs in w as a subformula that is not in the scope of 
any fixpoint operator. Throughout, we restrict to formulas that are guarded, i.e. 
have at least one modal operator between any occurrence of a variable X and 
an enclosing binder nX. (This is standard although possibly not without loss of 
generality [10].) Moreover we assume w.l.o.g. that input formulas are clean, i.e. all 
fixpoint variables are mutually distinct and distinct from all free variables, and 
irredundant, i.e. X € FV(w) for all subformulas nX. y. We refer to a variable X 
that is bound by a least (greatest) fixpoint operator uX.x. (vX.x) in a formula ¢ 
as a -variable (v-variable) of ¢, and to the process of substituting such an X 
with its binding fixpoint literal (uX.x or vX.x, respectively) as unfolding. An 
occurrence of a subformula y of a formula ¢ contains an active -variable [15] 
if w can be converted into a formula containing a free occurrence of a -variable 
of ¢ by repeatedly unfolding v-variables of @. 

Formulas are evaluated over Kripke structures K = (W, (Ra)aca, T), consist- 
ing of a set W of states, a family (Ra)aca of relations Ra C W x W, and a valua- 
tion 7 : P + P(W) of the propositions. Given an interpretation i : Y — P(W) of 
the fixpoint variables, define [y]; C W by the obvious clauses for Boolean oper- 
ators and propositions, |X]; = i(X), Kayl: = {v E W | dw € Ra(v).w € iyli}, 
[lav] = {0 € W | Yw € Ra(v)-w € fh}, IHX. Yl: = wl and (YX. vs = 

viy]č, where Ra(v) = {w € W | (v, w) € Ra}, [YX (G) = [vlix>c], and p, 
v take least and greatest fixpoints of monotone functions, respectively. If w is 
closed, then |y]; does not depend on i, so we just write |y]. We denote the 
Fischer-Ladner closure [16] of a formula r by F(¢), or just by F, if no confusion 
arises; intuitively, F is the set of formulas that can arise as subformulas when 
unfolding each fixpoint operator in ¢ at most once. We note F < |¢| [16]. 

The aconjunctive fragment [15] of the u-calculus is obtained by requiring that 
for all conjunctions that occur as a subformula, at most one of the conjuncts 
contains an active p-variable. In the weakly aconjunctive fragment [24], this 
requirement is loosened to the constraint that all conjunctions that occur as a 
subformula and contain an active p-variable are of the shape Y A Ow, A... A 
Oyn AO (U1V...Vun), where Y does not contain active p-variables. For instance, 
for all n, the formula 7X, ...uX1.vXo.Vocjen(Gi ^ OXi) is aconjunctive (and 
equivalent to the weakly AT formula obtained by replacing QX; with 
OX; ^A OTAO(X: V T)). The permutation satisfiability games that we introduce 
work for the more expressive weakly aconjunctive fragment. 
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We will make use of the standard tableau rules [10] (each consisting of one 
premise and a possibly empty set of conclusions): 


(1) a wm H 
T,wvo T, [a]yi,..-, [a]¢n, (a)@ r, nX. 4% 
U one me i? Yi.. Yn $ O Tux axa 


(for a € A, p € P); we refer to the tableau rules by R and usually write rule 
applications with premise I and conclusion X =T,,...,T, sequentially: (T/). 

To track fixpoint formulas through pre-tableaux, we will use deferrals, that 
is, the decomposed form of formulas that are obtained by unfolding fixpoint 
literals. 


Definition 13 (Deferrals). Given fixpoint literals X; = nXi. Yi, i = 1,...,n, 
we say that a substitution o = [X 1 > xıl; . --; [Xn Xn] sequentially unfolds Xn 
if Xi <f Xi+ı for all 1 < i < n, where we write Y <s nX.¢ if Y < ọ and Ww 
is open and occurs free in ¢ (i.e. ø unfolds a nested sequence of fixpoints in Xn 
innermost-first). We say that a formula x is irreducible if for every substitution 


[Xi > xal;---3[Xn > Xn] that sequentially unfolds yn, we have that x = 
x1([X2 xə]; ---; [Xn xn]) implies n = 1 (i.e. x = xı). A formula Y belongs 
to an irreducible closed fixpoint literal 0», or is a 0,,-deferral, if Y = ac for some 
substitution o = [X1 > 64];...;[Xn > On] that sequentially unfolds 6, and 


some a <s 01. We denote the set of @,,-deferrals by dfr(6,). 


E.g. the substitution o = |Y + uY. (AX A OOY)]; [X => 6] sequentially unfolds 
the irreducible closed formula 6 = vX. uY. (AX AQOY), and (OY )o = uY. (HAA 
OOY) is a 6-deferral. A fixpoint literal is irreducible if it is not an unfolding 
YIX = 7X. 7 of a fixpoint literal nX. y; in particular, every clean irredundant 
fixpoint literal is irreducible. 

As a technical tool, we define a measure for the depth of alternation at which 
a deferral resides inside the fixpoint to which it belongs: 


Definition 14 (Alternation level and alternation depth). The alternation 
level al(¢o) := al(o) of a deferral ġo is defined inductively over |o|, where al(e) = 
al(e),, = al(e), = 0, for the empty substitution €, al(o; [X = nX. 4]) = al(o) +1 
if 7 = u and al(o;[X = 7X. ]) = al(c), otherwise, and 


al(o) nz ifnm=p 
al(o), +1 otherwise 


) 

) 
al(o)y ifn=v 
al(a),, +1 otherwise 


This definition assigns greater numbers to inner fixpoint literals, i.e. to defer- 
rals which occur at higher nesting depth, ie. with more alternation inside 
their sequence ø. Given a formula w, its alternation depth ad(@) is defined as 
ad(@) = max{al(d) | 6 € F, 50.6 € dfr(6)}. 


370 D. Hausmann et al. 


3.2 Limit-Deterministic Tracking Automata 


As a first step towards deciding the satisfiability of a weakly aconjunctive 
p-calculus formula ¢, we now construct a tracking automaton that takes branches 
of (that is, infinite paths through) standard pre-tableaux for ¢ as input and accepts 
a branch if and only if it contains a least fixpoint formula whose satisfaction is 
deferred indefinitely on that branch. To this end, we import the following notions 
of threads and tableaux from [10]: 


Definition 15. A pre-tableau for a formula ¢ is a graph the nodes of which are 
labelled with subsets of the Fischer-Ladner closure F; the graph structure L of 
a pre-tableau is constructed by applying tableau rules from R to the labels of 
nodes with the requirement that for each rule application (T /X) to the label T of 
a node v, there is a w with (v, w) € L such that the label of w is contained in X. 
Nodes whose labels are saturated (i.e. do not contain propositional or fixpoint 
operators) are called states. Formulas are tracked through rule applications by 
the connectedness relation ~C (P(F) x F)? that is defined by putting ®,¢ ~ 
W,w if and only if Y is a conclusion of an application of a rule from R to ® 
such that ¢ € ®, w € W, and the rule application transforms ¢ to w; if the rule 
application does not change ¢, then ¢ = w. E.g. we have ®,u, A Y2 ~ Y, ti, 
where i € {1,2} and W is obtained from ® by applying the rule (A) to Yı A Y2. 
A branch Vo, Yı... in a pre-tableau is a sequence of labels such that for all 
i > 0, Vi41 is an L-successor of Y;. A thread on an infinite branch Yo, ¥1,... 
is an infinite sequence t = Wo, 1... of formulas with Yo, Yo ~ Y1, Y1 ~~... A 
-thread is a thread t such that min(Inf(alot)) is odd, i.e. the outermost fixpoint 
literal that is unfolded infinitely often in t is a least fixpoint literal. A bad branch 
is an infinite branch that contains a u-thread. A tableau for ¢ is a pre-tableau 
for @ that does not contain bad branches. 


We import from [10] the well-known fact that the existence of tableaux in the 
sense defined above characterizes satisfiability. In [10], the result is shown for 
the more general unguarded u-calculus; we note that the restriction to guarded 
formulas does not invalidate the theorem. 


Theorem 16 ([10]). A p-calculus formula w is satisfiable if and only if there is 
a tableau for w. 


Given a formula ¢, we define the alphabet X to consist of letters that each 
identify a rule R € R, a principal formula from F and one of the conclusions 
of R. E.g. the letter ((V),0,p V Ôq) identifies the application of the disjunction 
rule to a principal formula p V q and the choice of the left conclusion; thus 
this letter identifies the transition from p V Qq to p by use of rule (V). We note 
|Xy| € O(|¢|). Further, we denote the set of all words that encode some branch 
and some bad branch in some pre-tableau for ¢ by Branch(¢) and BadBranch(¢), 
respectively. 

As a crucial result, we now show that limit-deterministic automata are 
expressive enough to exactly recognize the bad branches in pre-tableaux for 
weakly aconjunctive formulas. 
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Lemma 17. Let ọ be a weakly aconjunctive formula. Then there is a limit- 
deterministic PA A = (V, X4, 6,6, a) with |V| < |@| and idx( A) < ad(¢)+1 such 
that L(A) N Branch(¢) = BadBranch(¢). 


Proof (Sketch). The automaton nondeterministically guesses formulas to be 
tracked, one at a time; the set of states of the automaton is the Fischer-Ladner 
closure of ¢. The priorities of the transitions in the automaton are derived from 
the alternation level of the target formula of the respective transition; then every 
word w € L(A) that encodes some branch encodes a bad branch. Once a defer- 
ral is tracked, weak aconjunctivity implies that all compartments to which the 
tracked formula belongs are internally deterministic; this is the case since for 
conjunctions Y% = Yo A y1 A... A On A O1 V... V Yn) — the only case that 
can introduce nondeterminism — each next modal step determines just one of 
the formulas p; that has to be tracked; the conjunct wo does not contain active 
p-variables, so tracking it causes the automaton to leave all compartments to 
which w belongs. Thus the automaton is limit-deterministic. 


Example 18. We consider the aconjunctive formula 
b= uX.( pAvY. (O(Y Ap) V 9X)) 


which expresses the existence of a finite or infinite path on which p holds 
everywhere. We have the ¢-deferrals de, Y := (pA VY. (O(Y A p) V OX))o1, 
0 := (VY. (O(YAp)VOX))o1, x = (O(YAD)VOX )oa, (O(Y Ap))o2, T := (YAp)oe, 
Yoo, OXo2 and Xoz, where cı = |X +> ¢] and o2 = [Y => W];01. We consider 
a pre-tableau Pp for ọ and like in the proof of Lemma 17, we construct the 
limit-deterministic tracking automaton Ag, depicted below: 


Py: Ag: 
$ start 
w) =~ 
~ 
v) MEET 
5 iç 8: © 
o = (0) SS is 


The priorities in A, are derived as follows: As ad(¢) = 2 is even, we put k = 
ad(d) +1 = 3; since al($) = al(%) = 1, a(¢, (x), ¥) = a(04, (0), 6) = k—al(4) = 
2 and since al(p) = 0, aly, (^A), p) = a(s,(A),p) = k — al(¢) = 3. All other 
formulas have alternation level 2 and transitions to them obtain priority 1. The 
tracking automaton accepts exactly those branches in Py that start at node 1 
and take the loop through node 9 infinitely often; in these branches, ¢ can be 
tracked forever and evolves to ¢ infinitely often, i.e. their dominating formula 
is the least fixpoint formula ¢. All other branches loop through node 7 without 
passing node 9 from some point on; their dominating fixpoint formula is 6, a 
greatest fixpoint formula. We observe that due to the aconjunctivity of ¢, Ag is 
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limit-deterministic since the only two nondeterministic states w and ç each have 
only one outgoing (/A)-transition with priority less than k = 3. 


Given a weakly aconjunctive formula ¢, we use Lemma 17 to construct a limit- 
deterministic tracking automaton Ag with L(Ag)M Branch(¢) = BadBranch(¢). 
Then we put Lemma 11 to use to obtain an equivalent BA in which all states from 
Q = reach(73[F']) are levelled deferrals, i.e. pairs (Y, q) consisting of a deferral Y 
and a number q < [E], the level of the pair (Y, q); the level q encodes the odd 
alternation level 2q — 1. A levelled deferral (4, q) is active if al(W) = 2q — 1 and 
the automaton accepts branches which contain a levelled deferral that is active 
infinitely often without being finished. The set Q is just a subset of F. Next we 
use Theorem 7 to transform this BA to a DPA By, with L(Ay) = L(By). We 
complement By to a DPA Cy = (W, X4, ô, $, a) by decreasing the priority of each 


state in By by one; we have L(Cy) = L(Byg), that is, Cy accepts exactly those 
words that encode only ‘good’ branches, if they encode some branch in some 
pre-tableau for ¢. By construction, |W| € O((nk)!) and Cg has at most nk + 1 
priorities, and (recalling Definitions 6 and 10) the states in the carrier W of Cg 
are of the shape (U,!), where U is a subset of F and l is a partial permutation 
of levelled deferrals. For a transition t = ((U,1), r, (V,l’)) with (U,1), (V,l’) € W, 
r € Lig, if a(t) = 2(n—a)+1, then a is the lowest number such that al(@) = 2q—1, 
where l’ (a) = (¢,q) and the a-th element of l is not removed by the transition t 
(i.e. a(t) references the oldest levelled deferral in I’ that is active but not removed 
by the transition t) and if a(t) = 2(n—r)+2, then a(t) is the index of the oldest 
levelled deferral (¢,2q—1) that is finished (i.e. removed from l) in the transition 
t of the automaton Cy, which means that the according r-transition in Ay makes 
@ leave its 2q — 1-compartment. For a state v = (U,1), we define the label T'(v) 
of v as T(v) = U. 


3.3 Permutation Games 


The deterministic parity automaton Cg can now be combined with applications 
of tableau rules from R to form a satisfiability game for ¢. We proceed to recall 
the definition of parity games and some ensuing basic notions. A parity game is 
a graph G = (V, E, a) that consists of a set of nodes V, a set of edges E CV x V 
and a priority function a : E — N, assigning priorities to edges. We assume 
V = WUW, that is, every node in V either belongs to player Eloise (V4) or 
to player Abelard (W). A play p of G is a (possibly infinite) sequence vgv,... 
such that for all i > 0, v; € V and (vi, vi41) E€ E. A play p of G is won by 
Eloise if and only if p is finite and ends in a node that belongs to Abelard or 
p is infinite and max(Inf(a@ o trans(p))) is even (where trans(p) is defined by 
trans(p)(z) = (p(é), p(t + 1))); Abelard wins a play p if and only if Eloise does 
not win p. A (memoryless) strategy s : V + V assigns moves to states. A play 
p conforms to a strategy s if for all p(t) € dom(s), pli +1) = s(p(i)). Eloise 
has a winning strategy for a node v if there is a strategy s : V3 — V such that 
every play of G that starts at v and conforms to s is won by Eloise; we have a 
dual notion of winning strategies for Abelard. The winning regions wina(G) and 
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winy(G) are the sets of those nodes for which Eloise and Abelard have winning 
strategies, respectively. Solving a parity game G (locally) for a particular node 
v € V amounts to computing the winner of v. 

Now we are ready to define permutation games for weakly aconjunctive for- 
mulas ¢, using the DPA Cy = (W, 4,6, ¢, a) from the previous section. 


Definition 19 (Permutation games). Let ¢ be a weakly aconjunctive for- 
mula. We define the permutation game G(¢) = (W, E, 3) to be a parity game 
that has the carrier of Cg as set of nodes. For every node v € W for which T (v) is 
not a state, we fix a single rule that is to be applied to T (v) and a single principal 
formula ~, € T'(v) to which the rule is to be applied. If (V) is to be applied to 
T(v), then we put v € W3; otherwise, v € Wy. In particular, all state nodes 
are contained in Wy. For v € W, we put E(v) = U{d(v,a) | a E€ Xo}, where 
Sy C Lg consists of all letters a that encode the application of some rule to 
T(v) with the condition that the principal formula of the rule application must 
be W, if v is not a state node. Finally, we put G(v, w) = a(v,a,w) for (v, w) € E, 
where a € X, encodes the rule application that leads from v to w. 


Theorem 20. Let ¢ be a closed, irreducible and weakly aconjunctive formula. 
Then we have ({¢},[]) € wina(G(@)) if and only if @ is satisfiable. 


Proof. By construction, Eloise wins ({¢},[]) if and only if there is a tableau for 
@ (labelled by the labelling function T); we are done by Theorem 16. 


Due to the relatively simple structure and the asymptotically smaller size of 
the determinized automata Cy, the resulting permutation games are somewhat 
easier to construct and can be solved asymptotically faster than the struc- 
tures created by standard satisfiability decision procedures for the full -calculus 
(e.g. [5,10]) which employ the full Safra/Piterman-construction; note however, 
that our method is restricted to the weakly aconjunctive fragment. 


Corollary 21. The satisfiability of weakly aconjunctive -calculus formulas can 
be decided by solving parity games of size O((nk)!) and O(nk) priorities. 


The winning strategies for Eloise or Abelard in these games define models for or 
refutations of the respective formulas, so that we have 


Corollary 22. Satisfiable weakly aconjunctive -calculus formulas have models 
of size O((nk))). 


4 Implementation and Benchmarking 


We have implemented the permutation satisfiability games as an extension of the 
Coalgebraic Ontology Logic Reasoner (COOL) [11], a generic reasoner for coal- 
gebraic modal logics!. COOL achieves its genericity by instantiating an abstract 
reasoner that works for all coalgebraic logics to concrete instances of logics. 


' Available at https://www8.cs.fau.de/research:software:cool. 
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To incorporate support for the aconjunctive coalgebraic p-calculus, we have 
extended the global caching algorithm that forms the core of COOL to gen- 
erate and solve the corresponding permutation games, with optional on-the-fly 
solving; games are solved using either our own implementation of the fixpoint 
iteration algorithm for parity games (as in [1]) or PGSolver [8], which supports 
a range of game solving algorithms. Instance logics implemented in COOL cur- 
rently include linear-time, relational, monotone, and alternating-time logics, as 
well as any logics that arise as combinations thereof. In particular, this makes 
COOL, to our knowledge, the only implemented reasoner for the aconjunctive 
fragments of the alternating-time ji-calculus and Parikh’s game logic. 

Although our tool supports the aconjunctive coalgebraic p-calculus, we con- 
centrate on the standard relational aconjunctive -calculus for experiments, as 
this allows us to compare our implementation with the reasoner MLSolver [9], 
which constructs satisfiability games using the Safra/Piterman-construction and 
hence supports the full relational p-calculus; MLSolver uses PGSolver for game 
solving. 

To test the implementations, we devise two series of hard aconjunctive for- 
mulas with deep alternating nesting of fixpoints. The following formulas encode 
that each reachable state in a Kripke structure has one of n priorities (encoded 
by atoms q; for 1 < i < n) and belongs to either Eloise (qe) or Abelard (qa): 


daur(r) = AG VV (qi Aas) beame(m) = Gaur(n) A AG((qe Ada) V (=de A qa)) 


1<i<n jżi 


Here we use AG @ to abbreviate vX. (y AOX). Then the non-emptiness regions 
in parity automata and Eloise’s winning region in parity games can be specified 
by the following aconjunctive formulas (where Q € {19,0}: 


Pneln) = NXn-- vVX2.UuXı -po Yo = Vi<i<n (Gi A OX;) 
dwin (7) = nXn errre VX2.UX1-Pstrt(Yo) Pstrat (Wo) = (de A Wo) V (qa A y ) 
Furthermore, we define (for 9 € {0, D}) 


99 (i) = (q APY) V Viejen(G NOX) V Vicjei(G A OZ) 


The following series of valid formulas states that parity automata with n 
priorities can be transformed to nondeterministic parity automata with three 
priorities without affecting the non-emptiness region: 


O(n) = Paur(n) > (bne(n) > Vi even HX-VY.wZ. 06 (i)) 


Similarly, if Eloise wins a parity game with n priorities, then she can ensure that 
in each play, each odd priority 1 < i < n is visited only finitely often, unless a 
priority greater than 7 is visited infinitely often (the converse does not hold in 
general [4]): 


O2(n) := dgame(n) > (dwin(r) > VAN VX.MY.VZ. dstrat(9o(#)) ) 
i odd 
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Additionally, we devise two series of unsatisfiable formulas that exhibit the 
advantages of COOL’s global caching and on-the-fly-solving capabilities. These 
formulas are inspired by the CTL-formula series early(n, j, k) and earlyg.(n, j, k) 
from [13] but contain fixpoint-alternation of depth 2” inside the subformula 6: 


early-ac(n, j, k) = start, A init(p, n) A init(r, k) A AG ((r = c(r, k)) A (p > clp, n)))A 
AG ((Agcic; Pi > O(startr A 0)) A =p Ar) A (r +O r)) 
early-ac,.(n, j, k) = early-ac(n, j, k) A b A init(g,n) AAG (=(p A^ q) A >al A r))A 
AG ((q > clq,n)) AAF b A (b — (O pAQ starta AO 76))) 
init(x, m) = AG ((starta > (£ A No<cicm ™T:)) A (£ > Q x)) 
0 = NX ik) vX2.uX1. Vi<icor(bin(r, i — 1) AOXi), 


where c(x, m) encodes an m-bit counter using atoms £o, ...,%&m—1 and bin(r, i) 
denotes the binary encoding of the number i using atoms ro, ...,rg—1. The for- 
mulas early-ac(n, j, k) specify a loop p of length 2” that branches after j steps 
to a second loop r of length 2* on which the highest value of the counter (which 
counts from 0 to 2*—1 and then restarts at 0) is required to be an even number. 
For constant k, the contradiction on loop r yields a small refutation which can 
be found early, using on-the-fly solving. The formulas early-ac,.(n, j, k) extend 
this specification by stating that a third loop q of length 2” is started from 
loop p infinitely often. Procedures with sufficient caching capabilities will have 
to (partially) explore this loop at most once. 

We compare the runtimes of MLSolver and COOL on the formulas described 
above; we let COOL and MLSolver solve games using the local strategy improve- 
ment algorithm stratimprloc2 provided by PGSolver. To solve games on-the-fly 
with COOL however, we use our own implementation of the fixpoint iteration 
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algorithm, which in general is slower than PGSolver but has the advantage that 
it enables on-the-fly solving. With this option enabled, COOL constructs and 
solves the satisfiability games step by step and finishes as soon as one of the play- 
ers has a winning strategy in the partial game. For COOL, we have conducted all 
experiments with and without on-the-fly solving. For MLSolver, we also enabled 
the optimizations -opt litpro and -opt comp (and refer to the resulting prover 
configuration as MLSolverOpt). Tests have been run on a system with Intel Core 
i7 3.60 GHz CPU with 16 GB RAM. A more detailed description of the results of 
the experiments as well as binaries of a formula generator, the prover COOL and 
scripts that benchmark the various configurations of the provers are available in 
a figshare repository at [12]. 

We observe that COOL without on-the-fly solving generally finishes faster than 
both MLSolver and MLSolverOpt throughout all tested series of formulas (see 
Figs. 1-4); the reason for this appears to be that the permutation games solved 
by COOL are of size O((nk)!), where n < k, and hence asymptotically smaller 
than the Safra/Piterman games solved by MLSolver which are of size O(((nk)!)?). 
The size of the refutations for the formulas 0ı(n) and 62(n) is exponential in n 
so that on-the-fly solving does in fact increase the runtimes of COOL (see Figs. 1 
and 2); basically, these formulas cannot be decided early, and therefore any (neces- 
sarily unsuccessful) attempt to do so just consumes additional computation time. 
The formulas early-ac(n, 4, 2) and early-acg<(n, 4, 2), on the other hand, have refuta- 
tions of size polynomial in n, and COOL appears to benefit from on-the-fly solving 
for these formulas as it is able to decide them early (see Figs. 3 and 4). As men- 
tioned above, COOL uses our own unoptimized implementation of the fixpoint 
iteration algorithm [1] for on-the-fly solving; while this implementation is slower 
than PGSolver’s stratimprloc2 algorithm, the on-the-fly abilities of COOL seem 
to compensate this disadvantage for the early-ac(n, 4, 2) and early-acg<(n, 4, 2) for- 
mulas from n = 11 and n = 8 on, respectively. 
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5 Conclusion 


We have presented a method to obtain satisfiability games for the weakly 
aconjunctive p-calculus. The game construction uses determinization of 
limit-deterministic parity automata, avoiding the full complexity of the 
Safra/Piterman construction a) in the presentation of the procedure and its 
correctness proof and b) in the size of the obtained DPA (which comes from 
O((nk)!?) to O((nk)!)). The resulting permutation satisfiability games for the 
weakly aconjunctive j-calculus are of size O((nk)!), have O(nk) priorities, and 
yield a new bound of O((nk)!) on the model size for this fragment. We have 
implemented this decision procedure in coalgebraic generality and with support 
for on-the-fly solving as part of the coalgebraic satisfiability solver COOL; initial 
experiments show favourable results. 


The datasets generated and analyzed during the current study are available 
in the figshare repository: https://doi.org/10.6084/m9.figshare.5919451.v1. 
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Abstract. Model checking large networks of processes is challenging due 
to state explosion. In many cases, individual processes are isomorphic, 
but there is insufficient global symmetry to simplify model checking. This 
work considers the verification of local properties, those defined over the 
neighborhood of a process. Considerably generalizing earlier results on 
invariance, it is shown that all local mu-calculus properties, including 
safety and liveness properties, are preserved by neighborhood symme- 
tries. Hence, it suffices to check them locally over a set of representative 
process neighborhoods. In general, local verification approximates veri- 
fication over the global state space; however, if process interactions are 
outward-facing, the relationship is shown to be exact. For many network 
topologies, even those with little global symmetry, analysis with repre- 
sentatives provides a significant, even exponential, reduction in the cost 
of verification. Moreover, it is shown that for network families generated 
from building-block patterns, neighborhood symmetries are easily deter- 
mined, and verification over the entire family reduces to verification over 
a finite set of representative process neighborhoods. 


1 Introduction 


Networks of communicating processes are a model for distributed systems, cloud 
computing environments, routing protocols, many-core hardware processors, and 
other such systems. Often, networks are described parametrically, that is, a pro- 
cess template is instantiated at each node of a network graph. The expectation 
then is that basic correctness properties should hold regardless of the size and 
the shape of the network. 

Model checkers can determine, fully automatically, whether a fixed instance 
of a process network satisfies a correctness property. However, model checking 
suffers from exponential state explosion as the size of the analyzed network 
increases. Thus, one may aim for parameteric analysis of a network family, “in 
one fell swoop”; however, the parametric model checking problem (PMCP) is 
undecidable in general [2]. Limiting to compositional proofs makes parametrized 
verification more tractable; as shown in [20], the PCMCP (Parameterized Com- 
positional Model Checking problem) can be solved efficiently for standard 
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network families (rings, tori, wrap-around mesh, etc.) where the PMCP is unde- 
cidable even for invariance properties. 

In this work, we generalize these results considerably, from invariance to mu- 
calculus properties. We formulate a local version of the mu-calculus to describe 
behaviors of a single process and its immediate neighborhood. The logic allows 
specification of safety and liveness properties, each property being limited to 
assertions over a fixed process neighborhood — e.g., “A hungry philosopher even- 
tually acquires all adjacent forks”. The goal of this work is a method to prove 
such properties for all processes in a network and, moreover, to prove properties 
parametrically, i.e., for all networks in a family. 

Our analysis is based on a grouping of processes by local symmetry, where 
“balanced” processes have (recursively) similar neighborhoods [17,18, 20]. Such 
symmetries are common in parametric network structures, for example [18,19], 
c.f. [17,20]. We establish that the local state spaces of balanced processes are 
sufficiently bisimilar that they satisfy the same local mu-calculus properties. It 
is, therefore, enough to model-check a representative process from each balance 
class, while paying particular attention to ‘interference’ transitions from neigh- 
boring processes. 

We show that any universal local mu-calculus property established locally 
also holds on the global state space. Thus, a universal property can be estab- 
lished globally for all processes by checking it on the local state spaces of a few 
representatives. 

Many communication protocols are designed in such a way that a typical 
process must offer a given set of input/output services to its communication 
environment, irrespective of its internal state. We show that under such outward- 
facing interactions, the correspondence is exact: a local mu-calculus property 
holds globally if, and only if, it holds locally. 

We also detail the implications for entire families of networks that are defined 
by ‘symmetry patterns.’ For instance, a network family with a transitive global 
symmetry group can be analyzed by examining a single representative node. 
Such dramatic reductions in complexity are generally not possible for non-local 
properties. 

None of the symmetry reduction results rely in any essential manner on the 
processes being finite-state. To summarize the main results: 


— The local state spaces of balanced processes (the spaces incorporate interfer- 
ence from neighbors) are bisimilar. Hence, it suffices to model-check properties 
on representative processes of the balance equivalence classes, 

— The local state space simulates the global space up to stuttering. Thus, a 
universal local mu-calculus property holds on the global space if it holds on 
a representative local space, 

— With ‘outward-facing’ interaction, the local and global spaces are stuttering- 
bisimilar. A local mu-calculus property holds on the global space if, and only 
if, it holds on a representative local space. 


We also explore the implications of these results and, in particular, show 
that in several settings, local symmetries can be determined easily from process 
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syntax. We show that for isomorphic ‘normal’ processes operating in a network 
whose communication graph has at least transitive symmetry, a balance relation 
with a single representative process can be generated from the syntactic descrip- 
tion of the network. In another direction, we show that for networks formed 
from ‘building block’ patterns, the pattern instances serve as balance represen- 
tatives. These direct, syntactic constructions avoid having to build global sym- 
metry reduced structures, can lead to exponential reductions in the cost of model 
checking, and apply to many networks where global symmetry reduction tech- 
niques are ineffective. Moreover, entire network families can be model-checked 
via the analysis of a small number of representative processes, so that the savings 
in the cost of analysis are unbounded. 


2 Preliminaries 


Processes and Networks: Syntax. A network is a directed graph, defined by 
a set of nodes, N, a set of edges, E, and two connection relations: Out C N x E 
and In C N x E. Connections are directed from node n to the edges in Out(n), 
and directed inwards from the edges in In(n) to n. Nodes m and n are neighbors, 
denoted nbr(n,m), if they have a common connected edge. Node m points to 
node n if there is an edge e in Out(m) N In(n). 

A process is defined by a tuple (V, I, T), where V is a set of variables which 
defines its local state space; I(V) is a Boolean predicate defining the initial 
states; and T(V,V’) is a Boolean predicate defining the state transitions, using 
a copy V’ to denote the next state. Variables are partitioned into internal and 
external variables. External variables are labeled as read, or write, or both. The 
transition relation is required to preserve the value of read-only variables and its 
enabledness cannot depend on the values of write-only variables. 

A process network P is defined by a network graph, a set of processes, and 
an assignment, €. Every node n is assigned a process €(n), which we denote 
for convenience by P, = (Vn, In, Tn). Each edge e is assigned a variable €(e) in 
V = (Un: Va). The assignment € must assign In(n) to the read variables in 
Vn, Out(n) to the write variables of Vp, and the internal variables of Vp to no 
network edge. The shared variables of processes Pm and P,, are those assigned 
to common connected edges of m and n. 


Processes and Networks: Semantics. Semantically, the behavior of a process 
network P is defined as the process P = (I,V,T), where V = (Un: Va), I = 
(An: In), and T = (Vn : Ta A unchanged(V \ V,,)). This defines an interleaving 
semantics, with unchanged(W) denoting that the values of variables in W are 
unchanged. 

A global state is a function mapping variables in V to values in their domain. 
A local state of P, is a function mapping the variables in V, to values in their 
domain. An internal state of P, is a function mapping the internal variables of 
P,, to values in their domains. 
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For neighbors m,n, a joint state is a pair £x = (£m, £n), where £m and £n 
are local states of processes Pm and Pp, respectively, such that £m and £n have 
the same value for all shared variables. The transition relation T, is extended 
to joint states as T(x, y), which holds iff T,(@n,Yn) holds and the values of 
variables in Pm that are not shared with P, are unchanged. 


Invariants: Global and Compositional. Invariance is central to reasoning 
about dynamic system behavior. For a process network P as defined above, a 
global assertion, 0, is a set of global states of P. It is an inductive invariant for P 
if all initial states are in 0, i.e., [I(x) — 0(x)], and @ is closed under transitions, 
i.e., (A(x) A T(a,y) > 0(y)].t 

In place of a single invariance assertion, compositional reasoning postulates 
a set of local assertions, {@,}, where ôn is a set of local states of Pa, for each n. 
This set is a compositional inductive invariant if, for all n: 


(Init) The initial states of P, are included in 0,. That is, [In (£n) > On (£n )] 

(Step) Transitions of P, preserve 0,. That is, [An(@n) A Tn(£n, Yn) > On(Yn)] 

(Non-Interference) Assertion 0, is preserved by transitions of neighbors Pm, 
from every joint state satisfying both 6,, and On. Le., For all m such that 
nbr(n, m) and all joint states x = (£n, £m), Y = (Yn, Ym) : [On(4n) AOm(Lm) A 
T(x, Y) > On(Yn)] 


These constraints are in a simultaneous pre-fixpoint form over {0,,}. The 
least fixpoint is the strongest compositional invariant. For finite-state processes, 
this computation is polynomial-time in the size of the local state spaces. 


Theorem 1 [17]. If {0n} is a compositional inductive invariant then N; 0; is a 
global inductive invariant. 


Symmetry Between Neighborhoods. A neighborhood symmetry between 
nodes m and n is witnessed by a bijection, 3, which maps edges in In(m) to those 
in In(n) and edges in Out(m) to those in Out(n); we call (m, 6,n) a similarity. 
The set of similarities (m, 3,n) is a groupoid?. 

A balance relation ([17], c.f. [11]) links symmetries throughout a network: 
balanced nodes m,n have isomorphic neighborhoods, nodes connected to cor- 
responding edges of m,n are themselves balanced, and so on. Formally, a bal- 
ance relation, B, is a set of triples (m, 3,n), such that (m, 3,n) is a similarity; 
(n, 3-1, m) is in B; and for any node k that points to m, there is a node | which 
points to n and a bijection y such that (k, y, l) is in B, and y(e) = G(e) for every 
edge e that is connected to both m and k. 

The structure of this condition is similar to that of bisimulation (it is co- 
inductive); thus, there is a greatest fixpoint, which is the largest balance relation. 
Nodes m,n are balanced if (m, 3,7) is in the largest balance relation for some £. 


1 The notation, [y], from Dijkstra and Scholten [7], means that y is valid. 
? Le., (n,v,n) is a similarity for the identity map 1; if (m,3,n) is a similarity, so is 
(n, 8-1, m); and if (m, B,q) and (q, y,n) are similarities, so is (m, (y8), n). 
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A process network P respects balance relation B if balanced nodes are 
assigned processes with isomorphic initial states and transition relations: i.e., 
for all (m,G,n) € B, it is the case that [I,(G(s)) = Im(s)] for all s, and 
(T,(G(s), B(t)) = Tm(s,t)] for all s,t. Similarly, we say that local assertions 
{¢i} respect B if [6n(G(s)) = dm(s)] for all (m, 6,n) € B. We abbreviate these 
conditions as |In = 8(Im)], [In = 8(Tm)| and [én = ((dm)], respectively. 
Here, (3 is overloaded to permute local states of Pm. For local state s of node m, 
the local state 3(s) at node n is defined as follows: the internal states of m in s 
and n in G(s) are identical and, for every edge e connected to m, the value on e 
in s is identical to the value of G(e) in (s). A key result is that balanced nodes 
have isomorphic compositional invariants. 


Theorem 2 ((17]). If a process network respects balance relation B, its strongest 
compositional invariant also respects B. 


This theorem implies that it suffices to compute the strongest compositional 
invariant only for representative nodes®, as the invariants for all other nodes are 
isomorphic to those of their representatives. 


3 The Local Mu-Calculus 


Intuitively, a local property is one that refers to the local state of a node, e.g., 
“the process at node n is in its critical section”, or “the philosopher at node 
n holds all adjacent forks”. We are interested in establishing a local property 
f(n), parameterized by node n, and so isomorphic between nodes, for all nodes 
of a process network. We represent such a property by a mu-calculus formula. 
This has two interpretations: one in the global state space, the other in a com- 
positionally constructed local state space. Their connections are discussed in the 
next section. 


3.1 Syntax 


The local mu-calculus syntax and semantics is largely identical to that of the 
standard mu-calculus [15]. The only difference is the use of the E[U] operator 
in place of EX, this is given a stuttering-insensitive semantics. 

Let X be a set of atomic propositions, I’ be a set of propositional variables, 
and A a set of transition labels; these sets are mutually disjoint. Local mu- 
calculus formulas are defined by the following grammar. A formula is one of 


— An atomic proposition from X, 
A propositional variable from T, 
— 74, for a formula y, 


3 A balance relation B induces the equivalence relation m ~p n if (m,B,n) € B for 
some 3. The compositional fixpoint is calculated for a representative of each class of 
~p. In the fixpoint calculation, the assertion 6, is replaced by 7(6,), where r is the 
representative for n, and y is a chosen isomorphism such that (r,7,n) is in B. 
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— oy Ay, the conjunction of formulae y and 4, 

— ElpU. Y], where y,~ are formulas, and a is a transition label from A, 

— pZ.p(Z), where (Z) is a formula syntactically monotone in Z (i.e., all occur- 
rences of Z fall under an even number of negations). 


Operators A[p Wa Y] = 7E[7¢ Ua 7y] and vZ.9(Z) = 7auZ.(7y(-Z)) are the 
negation duals of E[U] and pu, respectively, with Boolean operations V and > 
defined as usual. 


3.2 Semantics 


A state space has the form (S, So, R, L), where S is a set of states, So is the 
set of initial states, R C S x AU {r} x S is a left-total transition relation, 
and L: 9 — 2 labels states with atomic propositions. A path is a sequence 
So, ao, $1,@1,... such that (si, ai, Si+1) € R for all i, where the sub-sequence 
ao, a1, ... is the label sequence of the path. 

The state set S generates a complete lattice of all subsets of S, ordered by set 
inclusion. A functional J : 25 — 2° is monotone if for all A, B such that A C B it 
is the case that J7(A) C IT(B). By the Knaster-Tarski theorem, every monotone 
functional has a least and a greatest fixpoint. Consider a formula y(Z1,..., Za) 
with free variables Z,,..., Za. Given an assignment À mapping each free variable 
to a subset of S, the interpretation of p under A is defined inductively as follows. 
We write M,s — ọ to mean that state s in space M satisfies a closed formula 
Q, i.e., s is in interp(y,¢) where € is the empty interpretation. 


— interp(p, A) = {s € S | p € L(s)}, for proposition p € X, 

— interp(Z, A) = A(Z), 

— interp(y A w, A) = interp(y, A) N interp(%, A), 
interp(4 y, A) = S \ interp(y, A), 

— State s is in interp(E[y Ua 4%], A) if, and only if, there is a finite path 7 from s 
to state t with label sequence 7*; a, where t is in interp(w, A) and every other 
state s’ on 7 is in interp(y, A). Informally, p holds until the first a-action, 
after which w is true, 

— interp(uZ.y(Z), A) is the least fixpoint of functional 7(X) = interp(y(Z), A’) 
where A’ extends A with the assignment of X to Z. 


3.3 Local and Global Interpretations 


Let 0 be a compositional invariant respecting a balance relation B. For any node 
n of the network, define H? as the following transition system: 


— The states are the local states of P, that satisfy 6,,, 
— A transition (s, s’) is either 
e A transition (labeled with n) by Pp from state s, or 
e An interference transition (labeled with m) by a neighbor P,, from a joint 
state (s,u) where 0,(s) and @m(u) hold, to a joint state (s', u’). 
By the properties of a compositional invariant, s’ is in 0, in both cases. 
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The only missing ingredient is a labeling of the states with atomic propositions. 
Given such a labeling, L, a closed formula evaluates to a set of local states. 

The global transition system G defines the semantics of the process network. 
For a given n, let Gn be G with transitions by Pp labeled with n, transitions by 
neighbors m of n labeled with m, and all other transitions (which cannot change 
the local state of P,,) labeled with 7. A local labeling L of P,, is extended to Gn 
by labeling a global state s with proposition p if p labels the local state of P,, in 
s. Formulas local to node n are evaluated over Ga. A closed formula evaluates 
to a set of global states. 


3.4 Simulation and Bisimulation 


For processes without 7 actions, a simulation relation a from process P to process 
Q is a relation from the state space of P to that of Q, satisfying: 


— Every initial state of P is related to an initial state of Q by a, and 

— If sat holds, then s and t satisfy the same atomic propositions, and 

— If sat holds and s’ is a successor state of s in P, there is a successor state t 
of t in Q such that s'at’ holds. 


If a simulation relation exists from P to Q, we say that Q simulates P. It 
is well known that if Q simulates P, then any standard universal mu-calculus 
formula that holds for all initial states of Q also holds for all initial states of P. A 
universal local mu-calculus formula is one where its negation normal form does 
not contain E[U]. Relation a is a bisimulation from P to Q if a is a simulation 
from P to Q and a7! is a simulation from Q to P. It is well known that bisimilar 
processes satisfy the same standard mu-calculus properties. 

For processes with 7 transitions, one can relax the third condition to allow 
the possibility of stuttering (cf. [4]): if sat holds, then for any state s’ reachable 
from s by a finite path 7 with label sequence 7*; a (for a non-7 letter a), there is 
a state t’ reachable from t by a finite path ô labeled 7*; a such that s’ and t’ are 
related by a, and every other pair of states u on m and v on 6 is related by a. 
Relation a is a stuttering bisimulation if a and a~' are stuttering simulations. 


Theorem 3. Stuttering simulation preserves universal local mu-calculus prop- 
erties. Stuttering bisimulation preserves all local mu-calculus properties. 


4 Connecting Local Mu-Calculus Interpretations 


We explore relationships between the local and global interpretation of formulas, 
and show the following: 


— The local state spaces of balanced nodes are bisimilar. It follows from Theo- 
rem 3 that balanced nodes satisfy the same local mu-calculus formulas. From 
this result, to model check a property of the form (Az :: f(z)), it suffices to 
check f(z) for the representatives of the balance equivalence classes. 
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— The local state space of node m stuttering-simulates the global state space 
up to the local state of m. It follows from Theorem 3 that a universal local 
mu-calculus formula on m holds globally if it holds locally. 

— If processes exhibit ‘outward-facing’ interaction, i.e., (roughly) the effect of 
interfering transitions is independent of the internal state of the interfering 
process, then the local and global state spaces are stuttering-bisimilar up to 
the local state of m. It follows that the two spaces satisfy precisely the same 
local mu-calculus formulas over m. 


Notation. In the proofs below, for a local state s of node n, the notation s[n] 
refers to the internal state of P, in s, and for an edge e that is connected to n, 
the notation s[e] refers to the value in s of the variable assigned to e. 


4.1 Bisimilarity Between Local State Spaces 


Theorem 4. Let B be a balance relation on a process network P, and 0 a compo- 
sitional invariant for the network. If P and 0 respect B, then for every (m, B, n) 
in B, Hê, and H? are bisimilar up to B. 


Proof: The bisimulation relation R relates a local state s of node m to a local 
state t of node n if G(s) = t. Before getting to the details of the proof, which 
is technical, we sketch the main reasoning. First, local transitions are easily 
matched by symmetry. For an interfering transition from a neighbor k of m, by 
balance, there is a matching neighbor l of n with a symmetric interference tran- 
sition. Crucially, the preservation of the compositional invariant under balance 
lets us transfer the joint state from which the interference transition occurs in 
H£, to a joint state with a matching interference transition in H. 

Suppose that s,¢ are states of m and n in the local state spaces H®, and H®, 
respectively, such that sRt holds, that is G(s) = t. By construction of H?, and 
H®, @m(s) and 6,(t) hold. 

Consider a step transition Tm(s, s’). Since Tm and Tn respect the balance rela- 
tion, B, by the local symmetry between the transition relations, T,,(G(s), 3(s’)) 
holds as well. Thus, for t’ = G(s’), we have that there is a step transition T,, (t, t’) 
such that s’ Rt’. By construction, s’ and t are successors of s and t, respectively, 
in the local state spaces. 

Now consider an interference transition in H®, from a joint state (s, u) where 
u is a local state of a neighbor k of m. The transition T;,(u, u’) creates a joint 
state (s’,u’). From the definition of balance, there is a neighbor l of n such 
that for some y, we have (k,7,/) in the balance relation. As 0 respects B by 
assumption, we have that 0; = 7(0%). As 0x(u) holds by the definition of the 
interference transition, the state v = y(u) is in 6;. We claim that there is a 
matching transition from the joint state (t, v). 

First, we show that the pair (t,v) forms a joint state. Consider any edge f 
that is shared between n and l. By balance, shared edges are mapped identically 
by 8 and 7; hence, e = 3~!(f) = y~1(f) is shared by m and k. By the definition 
of t = B(s) and v = y(u), we have that ¢[f] = s[e] and v[f] = ule]. As (s,u) is a 
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joint state, we have s[e] = ule]; hence, t[f] = v[f]. As f was chosen arbitrarily, 
it follows that t and v agree on the values of all shared edges, so (t, v) is a joint 
state. Moreover, the state t is in 0„ by assumption, and v is in 0; by construction. 

By the similarity between P;, and P,, there is a transition T;(y(u), y(u’)); 
letting v’ = y(u’), this can be expressed as T;(v, v’). That induces an interference 
transition in HÊ from the joint state (t, v) to a joint state (t', v’). 

Finally, we show that t = G(s’). Let e be an edge connected to node m and 
let f = G(e). Note that f is shared between n and l if, and only if, e is shared 
between m and k. Now if f is not shared between n and l, then t'[f] = t[f] by 
definition of interference; t|f] = s[e] as t = G(s); and s’[e] = s[e] by definition 
of interference. By transitivity, t'[f] = s’[e], as required. If f is a shared edge, 
then t'[f] = v'[f] by joint state; v'[f] = u’[e] as v’ = y(u’); and w’[e] = s'e] by 
joint state. By transitivity, ¢’[f] = s’[e]. The internal states of t,t’ and s,s’ are 
(respectively) identical, as they are unaffected by interference. Hence, t = G(s’). 

The proof so far shows that R is a simulation if (m,G,n) is in the balance 
relation. From the same argument applied to (n,3~',m), which must also be 
in the balance relation, the inverse of R is also a simulation. Hence, R is a 
bisimulation between H®, and H}. EndProof. 

We say that per-process propositional labelings respect balance if for every 
(m,3,n) in the balance relation, every atomic proposition p, and every local 
state s: [p E€ Ln(G(s)) = p E€ Ly(s)]. From Theorems 3 and 4, we obtain: 


Corollary 1. Let f(i) be a local mu-calculus formula parameterized by i. If the 
compositional invariant 6 and the interpretation of the atomic propositions in 
f respect balance relation B, then for any (m,G,n) in B and any local state s: 
H? ,s =| f(m) if, and only if, H}, B(s) =| f(n). 


4.2 Local-Global Simulation 


From the point of view of a process Pm, a transition in the global state space is 
either a transition of Pm, or an interference transition by one of the neighbors 
of m, or a transition by a “far away” process that has no immediate effect 
on the local space of m. Thus, global transitions can be simulated by step or 
interference transitions in the local space, with far-away transitions exhibiting 
stuttering. The converse need not be true, as interference transitions appear in 
the local space without the constraining context of the entire global state. 


Theorem 5. Let the scheduling of transitions in the global system be uncon- 
ditionally fair. For every m and any compositional inductive invariant 0, H8, 
simulates the global transition system Gm up to stuttering. 


Proof: For a global state s, let s[m] refer to the local state of node m in s. 
Define the relation R from global states to those of H®, by (s,t) € R iff 0(s) 
and s[m] = t. We show that R is a simulation, up to stuttering. The proof is by 
cases on the kinds of transitions from global state s to a successor state, s’. As 
6 is a global inductive invariant by Theorem 1, it is the case that 0(s’) holds. 
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Suppose the transition is by process m. Thus, Tm(s[m], s’[m]) should hold. 
As 0(s[m]) holds, this transition is in the local state space as well. Letting 
t = s'[m], we have s’ Rt’. 

Suppose the transition is by a neighbor k of m, so that T;(s[k], s’[k]) holds, 
and for all edges e that are not connected to k,s’le] = s[e]. By definition, 
On(s[m]) and 6;(s[k]) hold, so this is a valid interference transition in the local 
state space H?,. Denoting s[k] by u, this can be re-expressed as a joint transition 
from state (t, u) to (t’, u’), where u’ = s' [k]. Consider an edge e that is connected 
to m but not to k. Then t’[e] = (by non-adjacency) tle] = (by R) s[m][e] = 
(by non-adjacency) s'[m]|e]. Now consider an edge e that is shared by nodes m 
and k; then t’[e] = (by shared edge) u’[e] = (by definition) s’[k][e] = (by shared 
edge) s’[m][e]. The internal state of m is unchanged on either transition. Thus, 
t = s'[m], so that s’ Rt’, as desired. 

Finally, suppose the transition is by a process that is not a neighbor of m. 
Then s’[m] = s[m], so that s’ Rt holds. This is the stuttering step. As transitions 
are scheduled in an unconditionally fair manner, on any infinite computation 
from s, process m or one of its neighbors must eventually make a move. Hence, 
all stuttering is bounded. This establishes (fair) stuttering simulation between 
the two spaces. EndProof. 

From the preservation of universal local mu-calculus properties under stut- 
tering simulation, we have: 


Corollary 2. If f(m) is a universal local mu-calculus formula, then for any t, s 
such that s[m| = t: H? ,t | f(m) implies that Gm, s =| f(m) under fairness. 


m? 


4.3 Outward-Facing Interactions and Local-Global Bisimulation 


The obstacle to establishing bisimilarity in the proof of Theorem5 is that an 
interference transition from local state t may not have a corresponding transition 
from a related global state s, as the internal state of the interfering neighbor in s 
may be different from the internal state of the interfering neighbor of t. In some 
protocols, however, we see that interference depends only on the shared state. 
For instance, in a form of the dining philosophers’ protocol where a process may 
give up a fork if it is not eating, the interference transition (passing a fork to a 
neighbor) is dependent only on possession of the fork. In this setting, one can 
indeed show that the two spaces are bisimilar. 

We express the independence from internal state as a stuttering bisimulation 
within the interfering process. Define a relation Bm,n on the local state space of 
P, by (u,v) € Bm,n if u and v are both in 6,, and ule] = vle] for every edge e 
shared between m and n. We say that process n is outward-facing in interactions 
with its neighbor m if the relation Bm,n is a stuttering bisimulation on H£. 


Theorem 6. With outward-facing interaction, the local state space of process m 
is stuttering bisimilar to the global state space in terms of the local state of m. 


Proof: Define the relation R from global states to those of H® as in the proof 
of Theorem 5 by (s,t) € R iff @(s) and s[m] = t. 


Symmetry Reduction for the Local Mu-Calculus 389 


Consider a transition from t to t’. If the move is by process m, it is enabled 
in s as well, and the resulting states are related by R. Now suppose the move 
is an interference transition by a neighbor, n. Hence there is some joint state 
(t,u) of (m,n) such that the move is by n from (t,u) to (t’,u’). As u € n (by 
joint state) and s[n] € 0, (by definition of R), and the two are connected to 
the same local state of m, the pair (s[n],u) is in Bm,n. AS Bm,n is a stuttering 
bisimulation, there is a sequence, say ø, of transitions by Pp alone from s[n] to 
a state v’ such that (v’,u’) € By», and all intermediate states on ø from sfn] 
to v’ are related by Bm,n to u. Hence, the value of the shared edges between m 
and n is unchanged on ø until the final step, where it matches u’. Therefore, for 
the global computation induced by ø from s, the final state s’ is such that s’ Rv’, 
and for all intermediate global states x on that path, x Rt holds. This shows that 
R! is a stuttering simulation from the local to the global space. By Theorem 5, 
the relation R is a simulation from the global to the local space. Hence, R is a 
stuttering bisimulation between the spaces. EndProof. 


Corollary 3. With outward-facing interaction and unconditionally fair schedul- 
ing, the local state space of a process m satisfies the same local mu-calculus 
properties as the global state space. 


5 Syntactic Determination of Local Symmetries 


We show how to recognize local symmetry from syntactic structure. This also 
applies to network families, with corresponding unbounded savings in local ver- 
ification. First, we use relations between structure and global symmetry, and 
between global and local symmetries. Next, we show how local symmetries may 
be directly derived if network families are induced by a finite set of tilings. We 
note that when local symmetry is derived syntactically, either through the use 
of normal process descriptions, or through building block tiles, the computa- 
tion of the compositional invariant can be done symbolically, and in the case of 
tilings, directly on each tile, unlike the case of global symmetry reduction, where 
the symbolic (BDD-based) orbit relation is difficult to compute even for fully 
symmetric networks [5]. 


5.1 Program Symmetries 


Let P = ||ie[o..n-1)Pi,k = 1 be a fixed network where each component P; is an 
implementation of a process template W. Network topology is restricted so that 
all edges are bidirectional and connect only two nodes. Each Pm is described 
by a finite transition graph where if there is an arc from the internal node g 
to the internal node h then the arc is labeled by a guarded command p > A. 
Transitions are given by g : p — A: h where A is the local update function and 
p is a predicate over the neighborhood of Pm. The action A is given by a list of 
simultaneous updates to the shared variables, v1,...,vga, where v; is the variable 
across the edge (m, ni). 
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We name the variables associated with a process, depending on the specific 
topology, the left variable, the right variable, the forward variable of Pm, etc. 
This modeling tactic is used (see [8]) to stipulate that the update functions for 
the variables be process-index independent. 

Two transitions g : p > A: hand g' : p > A’: h’ are equivalent if 
g = g',h = k', p is semantically equivalent to p’ and A and A’ are semantically 
equivalent (c.f. [8]). Processes Pm and P, are equivalent if there is a bijec- 
tive mapping between equivalent transitions of Pm and P,. A permutation m 
of process indices is an automorphism of P if Pm is equivalent to P,(m) for all 
m E [0..k — 1]. 

As shown in [8] the global symmetries of the program P, essentially the 
permutations of [0..k — 1] that leave P unchanged, are a subset of the global 
symmetries of the global state space G. From P, one defines an undirected 
graph, the communication relation, CR [8]. The nodes of CR are the nodes of 
N of the topology (N, E) and there is an edge from m to n in CR iff the nodes 
are connected to a common edge. 

P is normal [8] if the transitions of P are given in the following form: 


g: (AneCR(m)p(m, n)) -> (AneCR(m)A(m, n)) th 


where each p(m, n) is a boolean expression over the internal state of P, and the 
neighborhood variables of P,,, or equality tests between the variables local to the 
neighborhood of Pm, and the assignments of A(m, n) are concurrent assignments 
to the neighborhood variables of Pm, where variable values may be swapped with 
each other or assigned constant values. When P is a normal process network [8] 
showed that global symmetries of CR are symmetries of P and are automor- 
phisms of G. 

This setting substantially simplifies the application of local symmetry. First, 
the balance relation can be “read off” directly from the relation C'R, as by results 
in [17], the global symmetries of CR define a balance relation over (N, E), which 
includes (m, 8, n) if there is asymmetry r of CR such that n(m) = n. Secondly, if 
CR induces a transitive symmetry group, then local symmetry reduction reduces 
to analysis of a single representative process and its neighborhood. This may 
result in an exponential reduction in the cost of model checking, compared with 
an analysis of the entire state space. (The global symmetry used in [8] provides an 
exponential reduction only when C'R is fully symmetric.) The check is in general 
over-approximate (cf. Corollary 2) but is exact under outward-facing interaction. 
In the parametric setting, the reduction is unbounded. 


5.2 Tilings 


Rings, tori, and other ‘regular’ network patterns have considerable local sym- 
metry but little global symmetry. Here we show how to enforce local symmetry 
across network families by generating them from a finite set of tiles. The tiles 
directly induce local symmetries and balance. 

Consider a fixed, finite set of process types where each process type has a 
fixed, finite set of edge directions, which are given unique names. The initial 
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condition and the transition relation of a process type may refer to the values 
on edges in the given direction. Each type is associated with a tile describing a 
fixed neighborhood pattern around a node of that type. The pattern specifies 
for each edge connected to the central node its direction from the center and 
the type and direction of the other process connected to it. The tiles induce a 
family of networks, typically of unbounded size, as follows. A network is in the 
family if (1) each node is assigned an instance of a process type, and (2) the 
neighborhood of a node matches the tile for that node type. For instance, a tile 
for a torus shape would have 4 neighbors, labeled north, south, east and west. 

A network family constructed in this manner has an induced balance relation, 
B, defined as follows. Let m,n be nodes of a network in the family. Let (m, 6, n) 
belong to B if (a) both nodes are instances of the same type and (b) 8 is the 
mapping which, for each direction a, relates the edge reachable in direction a 
from m to the edge reachable in the same direction from n. (E.g., it maps the 
north edge of m to the north edge of n.) 


Theorem 7. B is a balance relation for the induced family, with finitely many 
equivalence classes. 


Proof: We show that B is a balance relation, and that it is respected by the 
process assignment. The mapping (@ is an isomorphism of the edges connected 
to m and n, as both have the same type. Moreover, as their initial conditions 
and transition relations are derived from those of the type and are independent 
of node identities, they are isomorphic up to £. 

We now establish that B meets the balance relation. Consider a direction 
a. Let m’ (n’) be the node connected to m (n) in that direction. As m and 
n have the same tiling pattern, m’ and n’ have the same type, so the tuple 
(m’',y,n’) is in B, for the isomorphism y between the edges of m’ and n’ as 
given in the definition of B. Consider the edge e reached from m in direction a, 
and let b be the direction that this edge is reached from m’. Let f be the edge 
in direction a from n. As m and n follow the same tiling pattern, f must be 
reached from direction b from n’. Therefore, 3 and y agree on this edge. As the 
edge was chosen arbitrarily, this establishes the balance condition. The number 
of equivalence classes induced by the greatest balance relation is, then, at most 
the number of tiles, which equals the number of process types. EndProof. 

Theorem 7 implies that the compositional analysis of all instances of the net- 
work family can be reduced to the analysis of a finite set of representatives. This 
contrasts with global symmetry reduction for network families, where parame- 
terized collapse is not as simple, nor as general. Moreover, the required repre- 
sentatives are just the tiles. The easy syntactic symmetry reduction contrasts 
with the difficulty of computing global symmetry groups for network families. 


6 Applications 


Example 1. Consider a non-deterministic token-ring system P = ||;P;. The 
internal states of P; range over {T, H, E} with shared variables x; and x;+1 ranging 
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over {1, tok}. Initially, each process is in internal state T and either owns 0 
tokens or owns 1 token. The initial condition specifies that a single process owns 
the token. Processes cycle through states in the order T, H and E. A process in 
H can move to E only if it owns the token. When exiting Æ the process puts 
the token on its right and enters T. If a process is in T and has the token, 
then it either enters H or passes the token to the right. It can be shown that 
the process interactions are outward-facing. Verification of the mutual exclusion 
property for all i: AG(E; — (x; = tok)) can then be performed on a model with 
3 processes that suffices to see all reachable local states. 

In addition, a liveness property, for all i : AG(H; — AFE;), can also be 
verified using a combination of local arguments. The proof is constructed as 
follows: first, show that the system satisfies the invariant that there is exactly 
1 token in the system. Then show every process that has the token eventually 
passes the token to the neighbor on the right. Using the global system fairness 
assumption that each process executes infinitely often we can chain these proofs 
together to conclude that for any particular process P,,: AG(H,, — AFE,,) holds 
which by local symmetry implies: for all i : AG(H; —> AFE;). 


Example 2. Interestingly, the results about a single token ring network can 
be extended to a ring with 2 tokens. However, the minimal model requires 4 
processes. Similar reasoning holds for 3 tokens and we hypothesize can be gen- 
eralized to any fixed number of tokens. A related example is a ring with 2 types 
of processes, one labeled red and one labeled black. For rings with even numbers 
of processes, half of them red and half of them black, there are 2 equivalence 
classes. Local symmetry reduction can be used to verify behavior of the two 
equivalence classes for any even number of processes, though the networks have 
little global symmetry and do not have transitive symmetry. 


Example 3. Several works including [3, 9, 10, 14] have considered using counting 
arguments as a way of implementing full symmetry reduction. Given an n process 
system, with isomorphic processes having local state spaces of size m, and full 
global symmetry on [1..n] the idea is to replace the global symmetry-reduced 
model with a set of m counters, where the counter values record the number 
of components in each of the different local states. A combinatorial argument 
[22] shows that the number of combinations of n isomorphic process each with 
m local states, is (m + n — 1)!/(n!(m — 1)!). If n > 2m, this is more than 2”. 
On the other hand, if each component has b neighbors, the local representative 
(full global symmetry implies a single balance class) has a local state space of 
size approximately m?. Over a parametric analysis m? is a constant and b, the 
number of neighbors, is likely to be small in comparison with m. 


7 Discussion and Related Work 


We studied the relationship between the satisfaction of temporal properties on 
the global state space of a process network and on individual local state spaces. 
We show that “balanced” processes have bisimilar local spaces and therefore 
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satisfy the same local mu-calculus formulas. Hence, for a local formula f(n) that 
is universal in nature, the satisfaction of f(n) on the local space of node n implies 
that f(n) holds of the global state space. Thus, if universal formulas { f(n)} hold 
for all nodes n, then (Ai: f(i)) holds for the global state space. This provides 
an approximate way to establish quantified mu-calculus properties. Moreover, 
as balanced nodes satisfy the same formulas, it is only necessary to model-check 
representatives of the balance equivalence relation. For a fixed process network, 
the restriction to local state spaces can result in exponential savings (in the 
number of nodes), and the further restriction to representative spaces results in 
a further linear cost saving. More dramatically, we show that network families 
constructed from building-block “tiles” have a finite set of representative nodes, 
so the cost saving is unbounded for parametric analysis. When network processes 
communicate with their neighbors in an outward-facing manner, these results 
carry over to the entire local mu-calculus, not just to universal properties. 

The results build on our earlier work on balance relations and local sym- 
metry [17,18,20]. That work focused on compositional invariants [21] the cen- 
tral result being that the strongest compositional invariants for balanced nodes 
are isomorphic. The current paper shows that the isomorphism applies to all 
local mu-calculus properties. The local state spaces on which the mu-calculus 
properties are evaluated are built using compositional invariants. An elegant 
methodology using 3-valued logic to compositionally verify mu-calculus prop- 
erties is developed in [23]; however, it applies to pairs of processes, and thus 
does not consider symmetries in larger networks. The definition of network fam- 
ilies through tilings has similarities to the network grammars used in [24,26]; 
however, the verification techniques are different. 

The framework of this paper considers the neighborhood of a single node. 
Compositional invariants have been generalized to apply to groups of processes, 
to accommodate properties stated over all pairs i,j, or over all neighbors i, j; 
see for example [1,6, 12,13, 16]. Construction of a comprehensive theory of neigh- 
borhood symmetry for groups of processes is still an open question. 

Global symmetry reduction, developed in [5,8,14], is based on a beautiful 
mathematical theory of automorphisms in graphs. However, in practice, symme- 
try reduction runs into difficulties, usually because there is not enough global 
symmetry in a process network, but also because for even highly symmetric 
networks, symbolic manipulation of symmetry reduced structures is difficult. In 
fact [5] shows that any BDD-based representation of the global symmetry group 
for any network with only transitive symmetry would likely incur a prohibitive 
cost. By focusing on local similarities, a strict generalization of global symme- 
tries [17,20], we can avoid these problems and obtain exponential improvements. 
The theory of local symmetries is based on network groupoids, and we note that 
any network automorphism group induces a balance relation. 

We also consider parameterized verification. For network families built from 
building-block tiles, there is a finite set of representative neighborhoods, and 
it suffices to prove a parameterized local mu-calculus property for each of 
those representatives to show that it holds for the entire family. This is an 
approximate method for parameterized verification. In prior work [20], we had 
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introduced the local PCMCP (parameterized compositional model-checking) 
question as a decision problem that is, in many cases, more tractable than the 
global PMCP (parameterized model-checking) problem. Deciding PCMCP for 
local mu-calculus properties is a challenging open question. 
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Abstract. Parameterized verification of temporal properties is an active 
research area, being extremely relevant for model-based design of com- 
plex systems. In this paper, we focus on parameter synthesis for stochas- 
tic models, looking for regions of the parameter space where the model 
satisfies a linear time specification with probability greater (or less) than 
a given threshold. We propose a statistical approach relying on simulation 
and leveraging a machine learning method based on Gaussian Processes 
for statistical parametric verification, namely Smoothed Model Check- 
ing. By injecting active learning ideas, we obtain an efficient synthesis 
routine which is able to identify the target regions with statistical guar- 
antees. Our approach, which is implemented in Python, scales better 
than existing ones with respect to state space of the model and num- 
ber of parameters. It is applicable to linear time specifications with time 
constraints and to more complex stochastic models than Markov Chains. 


Keywords: Parameter synthesis - Parametric verification 
Smoothed model checking - Gaussian Processes 


1 Introduction 


Overview. Stochastic models are commonly used in many areas to describe and 
reason about complex systems, from molecular and systems biology to perfor- 
mance evaluation of computer networks. In all these cases, the system dynamics 
is usually described by high-level languages as Chemical Reaction Networks [1], 
population models [2] or Stochastic Petri Nets [3], which generate an underlying 
Continuous Time Markov Chain (CTMC). Formal reasoning about these mod- 
els often amounts to the computation of reachability probabilities. This is the 
basic tool behind successful Stochastic Model Checking tools like PRISM [4] or 
the more recent STORM [5]. These tools implement numerical algorithms that 
compute probabilities up to a given precision, suffering though from state space 
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explosion, as well as simulation engines that allow statistical estimation when 
models are too large. 

All classic quantitative verification tools assume that a model is fully speci- 
fied, which is typically a strong assumption, particularly in application domains 
like system biology, where many model parameters are estimated from data or 
are only known to belong to a given range. An alternative approach is that of 
parameterised verification, which tries to verify properties for a whole set of 
models, indexed by some parameters. In case of stochastic models, this typically 
requires us to compute how reachability probabilities change as a function of 
model parameters, which is a much harder task [6]. A related problem is that of 
synthesis [7], where one looks for a subset of the parameter space where a given 
property (or multiple properties [8]) is guaranteed to be satisfied. Alternatively, 
one can try to design a system by finding a value that maximises the probability 
of satisfying a specification. 


Problem Statement. In this paper, we focus on parameter synthesis for CTMC 
models described by chemical reaction networks, benchmarking against the app- 
roach of [7]. 

More specifically, we consider the following problem. We have a collection 
of CTMCs, indexed by a parameter vector 0 € O, taking values in a bounded 
and compact hyperrectangle © C R*. We assume that the CTMCs depends 
on 0 through their rates, and that this dependency is smooth. We consider 
a linear time specifications ¢ described by Metric Interval Temporal Logic [9], 
with bounded time operators. For each ¢ and 0, we can in principle compute the 
probability that a random trajectory, generated by that specific CTMC, satisfies 
it, ie. Py(O). 

Our goal is to find a partition of the parameter space O composed by three 
classes. The positive class Pg which is composed by parameters where the prob- 
ability of satisfying ¢ is higher than a threshold value a, the negative class Ma 
composed by parameters where this probability is lower than a and the unde- 
fined class Ua which collects all the other parameters. Following [7], we will look 
for a partition where the volume of the undefined class is lower a fraction of the 
volume of O. This is the threshold synthesis problem. 

Our approach will be statistic: we assume that models are too complex to 
numerically compute bounds on the reachability probability, and we only rely 
on the possibility of simulating the model. As a consequence, our solution to 
the parameter synthesis problem will have only statistical guarantees of being 
correct. For example, if a parameter belongs to Pa, the confidence of this point 
satisfying P,(@) > a will be larger than a prescribed probability (typically 95% 
or 99%), though for most points this probability will be essentially one, and simi- 
larly for Ma. The challenge of such an approach is that estimating the satisfaction 
probability at many different points in the parameter space by simulation is very 
expensive and inefficient, unless we are able to share the information carried by 
simulation runs at neighbouring points in the parameter space. 
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Contributions. We propose a Bayesian statistical approach for parameter syn- 
thesis, which leverages a statistical parameterised verification method known as 
Smoothed Model Checking [6] and the nice theoretical approximation properties 
of Gaussian Process [10]. Being based on a Bayesian inference engine, this natu- 
rally gives statistical error bounds for the estimated probabilities. Our algorithm 
uses active learning strategies to steer the exploration of the parameter space 
only where the satisfaction probability is close to the threshold. We also provide 
a prototype implementation of the approach in Python. 

Despite being implemented in Python, our approach turns to be remarkably 
efficient, being slightly faster than [7] for small models, and outperforming it for 
more complex and large models or when the number of parameters is increased, 
at the price of a weaker form of correctness. Compared to [7], we also have 
an additional advantage: the method treats the simulation engine and the rou- 
tine to verify of the linear time specification on individual trajectories as black 
boxes. This means that we can not only treat arbitrary MTL properties (while 
in [7] they is an essential restriction to non-nested CSL properties, i.e. reacha- 
bility), but also other more complex linear time specifications (e.g. using hybrid 
automata, provided that the satisfaction probability is a smooth function of 
model parameters), and we can also apply the same approach to more complex 
stochastic models for which efficient simulation routines exist, like stochastic 
differential equations. 


Related Work. Parameter synthesis of CTMC is an active field of research. 
In [7,11] the authors use Continuous Stochastic Logic (CSL) and uniformiza- 
tion methods for computing exact probability bounds for parameteric models 
of CTMCs obtained from chemical reaction networks. In [12] the same authors 
extend their algorithm to GPU architecture to improve the scalability. Authors 
in these two papers solve two problems: one is the threshold synthesis, the other 
is the identification of a parameter configuration maximising the satisfaction 
probability. In this paper we focus on the former, as we already presented a 
statistical approach to deal the latter problem in [13] for the single objective 
case and in [8] for the multi-objective case. An alternative statistical approach 
for multi-objective optimisation is that of [14], where authors use ANOVA test 
to estimate the dominance relation. Another approach to parameter synthesis 
for CTMC is [15], where the authors rely on a combination of discretisation of 
parameters with a refinement technique. 

In this work we use a statistical approach to approximate the satisfaction 
probability function, building on Smoothed Model Checking [6]. This approach 
is applicable to CTMC with rate functions that are smooth with respect to 
parameters, and leverages statistical tools based on Gaussian Process regression 
[10] to learn an approximation of the satisfaction function from few observa- 
tions. Moreover, this approach allows us to deal with a richer class of linear 
time properties than reachability, like those described by Metric Temporal Logic 
[9,16], for which numerical verification routines are heavily suffering from state 
space explosion [17]. Another statistical approach is that of [18], which com- 
bines sensitivity analysis, statistical model checking and uniform continuity to 
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approximate the satisfaction probability function, but it is restricted to cases 
when the satisfaction probability is monotonic in the parameters. In contrast, 
Gaussian Process-based methods have no restriction (as Gaussian Processes are 
universal approximators), and have also the advantage of requiring much less 
simulations than pointwise statistical model checking, as information is shared 
between neighbouring points (see [6] for a discussion in this sense). Parametric 
verification and synthesis approaches are more consolidated for Discrete Time 
Markov Chains [19], where mature tools like PROPhESY exist [20], which rely 
on an symbolic representation of the reachability probability, which does not 
generalise to the continuous time setting. 


Paper Structure. The paper is organized as follows. In Sect. 2 we discuss back- 
ground material, including Parametric CTMCs, MITL, and Smoothed Model 
Checking and Gaussian Processes. In Sect.3 we present our method in detail. 
In Sect. 4 we discuss experimental results, comparing with [7]. Conclusions and 
future work are discussed in Sect. 5. 


2 Background 


In this section we introduce the relevant background material: a formalism to 
describe the systems of interest, i.e. Parametric Chemical Reaction Networks, 
and one to describe linear time properties, i.e. Signal Temporal Logic. We then 
present smoothed model checking [21] and Gaussian Processes [10], which form 
the underlying statistical backbone of the parameter synthesis. 


2.1 Parametric Chemical Reaction Networks 


Chemical Reaction Networks [1] are a standard model of population processes, 
known in literature also as Population Continuous Time Markov Chains [2] or 
Markov Population Models [22]. We consider a variant with an explicit repre- 
sentation of kinetic parameters. 


Definition 1. A Parametric Chemical Reaction Network (PCRN) M is a tuple 
(S,X, D, xo, R,O) where 


- S = {51,..., Sn} is the set of species; 

- X = (Xı,..., Xn) is the vector of variables counting the amount of each 
species, with values X € D, with D CN” the state space; 

— Xo € D is the initial state; 

- R = {ri,...,%m} is the set of chemical reactions, each of the form rj = 
(vj,a;), with vj the stoichiometry or update vector and a; = a;(X,0) the 
propensity or rate function. Each reaction can be represented as 


Qj 
Tj : Ujas +... + Uj nSn —> Wj 1S1 +... F WjnSn, 


where uji (wji) is the amount of elements of species s; consumed (produced) 
by reaction r;. With uj = (uj1,---,Ujn) (and similarly wj), vj = wj — uj. 
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- 0=(01,...,0%) is the vector of (kinetic) parameters, taking values in a com- 
pact hyperrectangle O C R*. 


To stress the dependency of M on the parameters 0 € O, we will often write 
Mog. A PCRN Mog defines a Continuous Time Markov Chain [2,23] on D, with 
infinitesimal generator Q, where Qg y = rer Oy (@, @) | y= atu}, ery: 
We denote by Pg the probability over the paths Path® of Mg of such a CTMC. 


2.2 Metric Interval Temporal Logic 


Metric Interval Temporal Logic (MITL [16]) is a discrete linear time tempo- 
ral logic used to reason about the future evolution of a path in continuous time. 
Generally this formalism is used to qualitatively describe the behaviors of trajec- 
tories of differential equations or stochastic models. The temporal operators we 
consider are all time-bounded, like in Signal Temporal Logic [9], a signal-based 
version of MITL. This implies that time-bounded trajectories are sufficient to 
verify every formula. The atomic predicates of MITL are inequalities on a set of 
real-valued variables, i.e. of the form p(X):=[g(X) > 0], where g : R” — Risa 
continuous function and consequently u : R” — {T, 1}. 


Definition 2. A formula ¢ € F of MITL is defined by the following syntaz: 


g:=1L|T |u|] V | dUi7, 119, (1) 
where u are atomic predicates as defined above, and Ti < To < +00. 


Eventually and globally modal operators are defined as customary as Fir, 7,)¢ = 
TU rp, 7,19 and Grr, 7,16 = TF (7, 7.) 7¢. MITL formulae are interpreted over the 
paths a(t) of a PCRN Mg. We will consider here the Boolean semantics of [9], 
which given a trajectory x(t), returns either true or false, referring the reader to 
[9] for its definition and for a description of monitoring algorithms. Combining 
this with the probability distribution Pg over trajectories induced by a PCRN 
model Mg, we obtain the satisfaction probability of a formula ¢ as 


P3(8) = P($ | Mo) := Po({x(t) € Path’? | (w,0) E o}) 


2.3 Parametric Verification and Smoothed Model Checking 
Given an MITL formula ¢ and a CTMC Mo, we consider two verification tasks: 


— (Classic) Verification: compute or estimate the satisfaction probability P4(0) 
for a fixed 0. 

— Parametric verification: compute or estimate the satisfaction probability 
P,(@) as a function of 0 € ©. 
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The classic verification task can be solved with specialised numerical algo- 
rithms [17,24]. These methods calculate Ps(@) by a clever numerical integration 
of the Kolmogorov equations of the CTMC. This approach, however, suffers from 
the curse of state space explosion, becoming inefficient for big or complex models. 
A viable alternative is rooted in statistics. The key idea is to estimate the satis- 
faction probability by combining simulation and monitoring of MITL formulas. 
In practice, for each trajectory x generated by a simulation of the CTMC Mo, 
we verify if x = ¢. This produces observations of a Bernoulli random variable 
Zo, which is equal to 1 if and only if the trajectory satisfies the property, and 0 
otherwise. By definition, the probability of observing 1 is exactly Ps(@), which 
can thus be estimated by frequentist or Bayesian statistical inference [25, 26]. 

Parametric verification brings additional challenges. For PCRN, the numeri- 
cal approach of [27] provides upper and lower bounds on the satisfaction function. 
By decomposing the parameter space in small regions, one can provide a tight 
approximation of the satisfaction function, at the price of a polynomial cost in 
the dimension of the state space and of an exponential cost in the dimension of 
the parameter space [27]. 

The statistical counterpart for parametric verification is known as Smoothed 
Model Checking [6]. This method combines simulations in few points of the 
parameter space with state-of-the-art generalised regression methods from statis- 
tics and machine learning to infer an analytic approximation of the satisfaction 
function, mapping each @ to the corresponding value of P¿(0). The basic idea 
is to cast the estimation of the satisfaction function as a learning problem: from 
the observation of few simulation runs at some points of the parameter space, we 
wish to learn an approximation of the satisfaction function, with statistical error 
guarantees. Smoothed Model Checking solves this problem relying on Gaussian 
Process (generalised) regression, a Bayesian non-parametric method that returns 
in each point an estimate of the value of the satisfaction function together with 
confidence bounds, defining the region containing the true value of the function 
with a prescribed probability. The only substantial requirement for Smoothed 
Model Checking is that the satisfaction probability is smooth with respect to 
the parameters. This holds for MITL properties interpreted over PCTMCs [6]. 
Smoothed Model Checking will be the key tool for our synthesis problem, hence 
we will introduce it in more detail, after a brief introduction of its underlying 
inference engine, i.e. Gaussian Processes. 


Gaussian Processes. Gaussian Processes (GPs) are a family of distributions 
over function spaces, used mostly for Bayesian non-parametric classification or 
regression. More specifically, a GP is a collection of random variables f(a) € R 
(x € E, a compact subset of R”) of which any finite subset defines a multivariate 
normal distribution. A GP is uniquely determined by its mean and covariance 
functions (called also kernels) denoted respectively with m : E — R and k : 
E x E > R such that for every finite set of points (£1, £2,..., En): 


f~GP(m,k) <=> (f(a), f(@2),.-., f(@n)) ~N (m, K) (2) 
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where m = (m(t1), m(t2),...,M(tn)) is the vector mean and K € R”*” is the 
covariance matrix, such that K;; = k(a;,a,;). From a functional point of view, 
GP is a probability distribution on the set of functions g : E — R. The choice 
of the covariance function is important from a modeling perspective because it 
determines which functions will be sampled with higher probability from a GP, 
see [10]. 

GP are popular as they provide a Bayesian non-parametric framework for 
regression and classification. Starting from a training set {(æ;, Yi) }i=1,....n of 
input a; and output y; pairs, and a prior GP, typically with zero mean and a 
given covariance function, GP regression computes a posterior distribution given 
the observations, which is another GP, whose mean and covariance depend on 
the prior kernel and the observation points. In particular, for real valued y; 
and Gaussian observation noise, the posterior mean at a point æ* is a linear 
combination of the prior kernel k(x*, x;) evaluated at x«* and observation points 
x; with coefficients depending on the observations y;. The prior kernel thus plays 
a central role, and it sometimes depends on hyperparameters, that can be set 
automatically by optimising the marginal likelihood, as traditionally happens in 
Bayesian methods [10]. 

In this work we use the Gaussian Radial Basis Function (GRBF) kernel 
[10], as samples from a GP defined by it can approximate arbitrarily well any 
continuous function on a compact set Æ. The kernel is defined as 


k(x1,@2) = exp(—|la1 — x2I|°/1°), 


where | is the lengthscale hyperparameter, which roughly governs how far away 
observations are contributing to predictions in a point (as if a* and æ; are 
much more distant than l, then k(a*,a;) is approximately zero). Moreover, | 


determines the Lipschitz constant of the GRBF kernel, which is ~ aks and a 
fortiori of the prediction itself (being a linear combination of kernel functions). 


Smoothed Model Checking. Smoothed Model Checking is a statistical 
method to estimate the function P4(0), casting it into a learning problem taking 
as input the truth value of ¢ for several simulations at different parameter val- 
ues 01,..., 0n, with few simulation runs (M < +00) per parameter point. The 
method tries to reconstruct a real-valued latent function f(0@), which is squeezed 
to [0,1] via the Probit transform! W to give the satisfaction probability at 0: 
P¿(0) = (f(@)). Let us denote with O = [01,02,...,0,] the matrix whose 
rows o; are the Boolean m-vectors of the evaluations in 0;. Hence, we have that 
each observation o; is an independent draw from a Binomial(M, P,(0;))). 
Smoothed Model Checking plugs these observations into a Bayesian infer- 
ence scheme, assuming a prior p(f) for the latent variable f. As f is a random 
function, one can take as a prior a GP, specifying its mean and kernel function, 


1 The Probit W(a) = p(Z < x) is the cumulative distribution function of a standard 
normal distribution Z ~ N (0, 1), evaluated at the point x. 
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and then invoke Bayes theorem to compute the joint posterior distribution of f 
at a prediction point 0* and at the observation points 01,..., 0n as 


PCFC), f(01),---sf(On) | ©) = spl (0*), f(1), e) Ile (o; | f(@ 


Z 


In the previous expression, on the right hand side, Z is a normalisation constant, 
while p( f(0*), f(@1),---, f(@n)) is the prior, which is Gaussian distribution with 
mean and covariance matrix computed according to the GP. p(o; | f(@;)), 
instead, is the noise model, which in our case is given by a Binomial density. 
By integrating out the value of the latent function at observations points in the 
previous expression, one gets the predictive distribution 


WEO) = f TI UEOMP), 0) (Bn) |O)- 


The presence of a Binomial observation model makes this integral analytically 
intractable, and forces us to resort to an efficient variational approximation 
known as Expectation Propagation [6,10]. The result is a Gaussian form for 
the predictive distribution for p(f(@") | ©), whose mean and 6-confidence region 
are then Probit transformed into [0, 1]. 

It is important to stress that the prediction of Smoothed Model Checking, 
being a Bayesian method, depends on the choice of the prior. In case of Gaussian 
Processes, choosing the prior means fixing a covariance function, which makes 
assumptions on the smoothness and density of the functions that can be sampled 
by the GP. The Gaussian Radial Basis Function is dense in the space of con- 
tinuous functions over a compact set [28], hence it can approximate arbitrarily 
well the satisfaction probability function. By setting its lengthscale via marginal 
likelihood optimization, we are picking the best prior for the observed data. 


3 Methodology 


3.1 Problem Definition 


We start by rephrasing the parameter synthesis problem defined in [7] in the 
context of Bayesian statistics, where truths are quantified probabilistically. The 
basic idea is that we will exhibit a set of parameters that satisfy the specification 
with high confidence, which in the Bayesian world means with high posterior 
probability. To recall and fix the notation, let Mg be a PCRN defined over a 
parameter space O, ¢ a MITL formula and P;(0) be a statistical approximate 
model of the satisfaction probability of ¢ at each point 0. In the Bayesian setting, 
P;(@) is in fact a posterior probability distribution over [0,1], hence we can 
compute for each measurable set B C [0,1] the probability p(Ps(0) € B). 


Problem (Bayesian Threshold Synthesis): Let Mg, O, ¢, and P,(0) as before. 
Fix a threshold a and consider the threshold inequality P¿(0) > a, for the true 
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satisfaction probability P,(@). Fix € > 0 a volume tolerance, and 6 € (0.5,1] 
a confidence threshold. The Bayesian threshold synthesis problem consists in 
partitioning the parameter space © in three classes Pa (positive), Ma (negative) 
and Ua (undefined) as follows: 


— for each 0 € Pa, p(P3(@) > a) > 6 
— for each 0 € Ny, p(Ps(0) <a) > 6 
), 


— Ua = O \ (Pa UNa), and one < €, where vol is the volume of the set. 


vol(O 


Note that the set Pa solves the threshold synthesis problem defined above, while 
Na solves the threshold synthesis problem P;(0@) < a. 


3.2 Bayesian Parameter Synthesis: The Algorithm 


Our Bayesian synthesis algorithm essentially combines smoothed Model Check- 
ing (smMC) with an active learning step to adaptively refine the sets Pa, Na, Ua, 
trying to keep the number of simulations of the PCRN Mg to a minimum. smMC 
is used to compute a Bayesian estimate of the satisfaction probability, given the 
samples of the truth of @ accumulated up to a certain point. More specifically, 
we use the posterior distribution p(P,(0)) of the satisfaction probability at each 
0 returned by smMC to compute the following two functions of 8: 


,0) is such that p (Ps(9) < At( O) >6 
>A7 


0 
,5) is such that p (P.(0) > (0,5) >ô 
Essentially, at each point 6, A+ (0,8) is the upper bound for the estimate P4 (0) 
at confidence 6 (i.e. with probability at least 6, the true value P¿(0) is less than 
At), while A~ (0,6) is the lower bound. These two values will be used to split 
the parameter space into the three regions Pa, No, Ug as follows: 


- 0 € Pa iff \- (0,5) >a 
—~ 0 € Na iff AT(0,6) <a 
- Ua = O \ (Pa UNa), 28 < e 


To dig into how A* and àT are computed, recall that smMC computes a real- 
valued Gaussian process f(O), with mean function u and covariance function 
k, from which the pointwise standard deviation can be obtained as o(@) = 
\/k(0,0). At each 0, the function f(O) is Gaussian distributed, hence we can 
compute the upper and lower confidence bounds for the Gaussian, and then 
squeeze them into [0,1] by the Probit transform W. Letting G5 = YW 1( oth), as 
customary while working with Normal distributions, we get: 


- A*(8,5) = Y(u( fo(0)) + Bso(fo(8))) 
~ A~ (0,8) = Wulf) — Bs0(Fo(4))) 
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Algorithm 1. Bayesian Parameter Synthesis. 
Input: O parameter space, M PCRN, ¢ MTL formula, a threshold, e volume preci- 
sion, 6 confidence 
S + initial_samples(O, M, ¢) 
Pa — b, Na — I, Ua — O 
while true do 
At, AT — smoothed MC(O, S) 
Pa, Na, Ua — update_regions ( rt, A`, Pa, Na, Ua) 
if vol(Ua)/vol(O) < e then 
return Pa, Na, Ua 
else 


S — refine_samples( S,Ua) 
end if 
: end while 


Os Oo Sy ee a 


ee 


The Bayesian synthesis procedure is described in Algorithm 1, which after 
initialisation enters the main loop, in which the computation of the positive, 
negative, and uncertain sets are carried out adaptively until convergence. Before 
proceeding further, we introduce some notation to describe regular grids, as 
they are used in the current implementation of the method. Let us consider 
the hyper-rectangular parameter space © = xX ay ,wt] C R”, where w7 
and we are respectively the lower and the upper bound of the domain of the 
parameter ĝ;. An h-grid of O is the set h-grid = Unem{w7 +m x h} where 
h = {hiy...phn}, M = X% {0,..., 24M}, wo = (wy,..., wa) and * is 
the elementwise multiplication. Given a grid, we define as basic cell a small 
hyperrectangle of size h whose vertices are points of the grid. 


Initialisation. The initialisation phase consists in running some simulations of 
the PCRN at some points of the parameter space, to have a first reconstruc- 
tion of the satisfaction function. As we do not need to be very precise in every 
part of the parameter space, but only for points O whose satisfaction probabil- 
ity P5(@) is close to the threshold a, we start by simulating the model on all 
parameters of a coarse grid ho-grid, with ho chosen such that the total number 
of parameters 0 explored is reasonably small for smMC to be fast. The actual 
choice will depend on the number of dimensions of the parameter space, as grids 
depend exponentially on it. Once the grid ho-grid is fixed, we simulate N runs 
of the model per each point and pass them to a monitoring algorithm for MITL, 
obtaining N observations of the truth value of the property ¢ at each point of 
ho-grid, collected in the set S. We also initialise the sets Pa, Na, and Ua. 


Computation of Pa, Na, and Ua Regions. The algorithm then enters the 
main loop, first running smMC with the current set of sample points S to com- 
pute the two functions At and \~. These are then used to update the regions 
Pa, Na, and Ua. Here we discuss several possible approaches. 
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Approach 1: Fixed Grid. The simplest approach is to partition the parameter 
space in small cells, i.e. using a h-grid with h small, and then assign each cell to 
one of the sets. The assignment will be discussed later, but it involves evaluating 
the functions A+ and A~ in each point of the grid. The method is accurate 
if each basic cell contains only a fraction of the volume much smaller than e. 
However, this requires to work with fine grids, whose size blows up quickly with 
the number of parameters. Practically, this approach is feasible up to dimension 
3 or 4 of the parameter space. 


Approach 2: Adaptive Grid. To scale better with the dimension of the parameter 
space, we can start evaluating the \+/~ functions on a coarse grid, and refine 
the grid iteratively only for cells that are assigned to the uncertain set, until a 
minimum grid size is reached. 


Central in both approaches is how to guarantee that all points of a basic cell 
are all belonging to one set, inspecting only a finite number of them. In particular, 
we will limit the evaluation of the \*+/~ functions to the vertices of each cell c, 
i.e. to the points in the grid h-grid. Intuitively, this will work if the cell has a 
small edge size compared to the rate of growth of the satisfaction function, and 
the values of the satisfaction function in its vertices are all (sufficiently) above 
or below the threshold. However, we need to precisely quantify this “sufficient”. 
We sketch here two exact methods and an heuristic one, which performs well in 
practice. We discuss here how to check that a cell belongs to the positive set, 
the negative one being symmetric. 


Method 1: Global Lipschitz bound. This approach relies on computing the Lips- 
chitz constant L of the satisfaction function. This can be obtained by estimating 
its derivatives (e.g. by finite difference or better by learning it using methods 
discussed in [10]), and performing a global optimization of the modulus of the 
gradient after each call to smMC. Let d(h) be the length of the largest diagonal 
of a basic cell c in a h-grid. Consider the smallest value of the satisfaction func- 
tion in one of the vertices of c, and call it p. Then the value of the satisfaction 
function in the cell is surely greater than p — Ld(h)/2 (after decreasing for half 
the diagonal, we need to increase again to reach the value of another vertex). 
The test then is p— Ld(h)/2 > a. 


Method 2: Local Lipschitz bound. The previous method will suffer if the slope of 
the satisfaction function is large in some small region, as this will result in a large 
Lipschitz constant everywhere. To improve it, we can split the parameter space 
is subregions (for instance, by using a coarse grid), and compute the Lipschitz 
constant in each subregion. An alternative we are investigating is to compute 
in each cell of the grid a lower bound of the function f(@) learned from the GP 
from its analytic expression. 


Heuristic Method. In order to speed up computation and avoid computing 
Lipschitz constants, we can make the function A~ more strict. Specifically, we 
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can use a larger Øs than the one required by our confidence level 6. For instance 
for a 95% confidence, Bs = 1.96, while we can use instead 85 = 3, corresponding 
roughly to a confidence of 99%. Coupling this with a choice of the grid step h at 
least one order of magnitude smaller than the lenghtscale of the kernel learned 
from the data (which is proportional to the Lipschitz constant of the kernel and 
of the satisfaction function), which guarantees that the satisfaction function will 
vary very little in each cell, we can be confident that if the strict A7 is above the 
threshold in all vertices of the cell, then the same will hold for all points inside 
c for the less strict A7. 


Refinement Step. After having build the sets Pa, Na, and Ua, we check if the 
volume of Ua is below the tolerance threshold. If so, we stop and return these 
sets. Otherwise, we need to increase the precision of the satisfaction function 
near the uncertain region. This means essentially reducing the variance inside 
Ua, which can be obtained by increasing the number of observations in this 
region. Hence, the refinement step samples points from the undefined regions 
U, simulates the model few times in each of these points, computes the truth 
of @ for each trace, and add these points to the training set S of the smoothed 
model checking process. This refinement will reduce the uncertainty bound in the 
undefined regions which leads some part of this region to be classified as Positive 
P or Negative M. We iterate this process until the exit condition mI) < 
€ is satisfied. The convergence of the algorithm is rooted in the properties of 
smoothed Model Checking, which is guaranteed to converge to the true function 
with vanishing variance as the number of observation points goes to infinity. 
In practice, the method converges quite fast, unless the problem is very hard 
(the true satisfaction function is close to the threshold for a large fraction of the 
parameter space). 


4 Results 


Implementation. We have implemented our algorithm in Python 3.6. The code 
is available at http://simonesilvetti.com/pycheck/. To improve the scalability of 
our algorithm, we profiled it to identify the most computationally expensive 
steps, among simulating the PCRN, checking the MITL formulae at each step, 
running smMC and partitioning the state space. The most expensive part in our 
test turned out to be the simulation step, which we performed using Gillespie 
SSA algorithm [1]. To speed up simulations, we ran them in parallel leveraging 
the Numba [29] package of Python which is optimal to execute array-oriented 
and math-heavy Python code. The smoothed model checking step, instead, is 
substantially independent with respect the number of repetitions. Its execution 
time depends on the cardinality of the training points. This is why, compared 
with [6], we increased the number of simulations per parameter point and reduced 
their number. We ran all the experiments on a Dell XPS, Intel Core i7-7700HQ 
2.8 GHz, 8GB 1600 MHz memory, equipped with Windows 10 Pro. 
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SIR Epidemic Model. We consider the popular SIR epidemic model [30], 
which is widely used to simulate the spreading of a disease among a population. 
The population of N individuals is divided in three classes: 


— susceptible S individuals that are healthy and vulnerable to the infection; 
— infected I individuals that are actively spreading the disease; 
— recovered R individuals, which gained immunity to the disease. 


The version of SIR model we consider is defined by the following two chemical 
reactions: 
Xs: Xi 

N 
ro: I >R ag = kr: Xi 


r: S +I R21 a, = ki: 


Here, rı describes the possibility that an healthy individual gets the disease and 
becomes infected and the reaction rg models the recovery of an infected agent. 
We described the model as a PCRN where k; € [0.005,0.3], ky € [0.005, 0.2] 
and initial population (S, I, R) = (95,5,0) and we consider the following MITL 
formula: 

$ = (T > 0) Uf100,120) (T = 0) (3) 


This formula expresses that the disease becomes extinct (i.e.; J = 0) between 100 
and 120 time units. Note that for this model extinction will eventually happen 
with probability one, but the time of extinction depends on he parameters 0 = 
(ki, kr). In the following, we report experiments to synthetise the parameter 
region such that P,(@) > a, with a = 0.1, volume tolerance € = 0.1, and 
confidence 6 = 95%. We consider all possible combinations of free parameters 
to explore (i.e. k; alone, k, alone, and k; and kp). The initial train set of the 
smoothed model checking approach has been obtained by sampling the truth 
value on the parameters disposed in a grid as described in Sect. 3, of size 40 
points for 1D case and 400 points for the 2D case. The satisfaction probability of 
each parameter vector which compose the training set, as well as, the parameter 
vectors sampled by the refinement process have been obtained by simulating the 
PCRN and evaluating the MITL formula 3 with 1000 repetitions per parameter 
point. 


Efficiency, Accuracy, and Scalability. The execution times of the experi- 
ments are reported in Table 1 (left). The results shows a good performance of 
our statistical algorithms, despite being implemented in Python rather then in a 
more efficient language like C. The execution time (in percentage) with respect 
to the results of the exact method reported in [7] are 42%, 18% and 7% for Case 
1, Case 2 and Case 3. Our results are reported using the heuristic method to 
compute the sets and a fixed grid of small stepsize h. 
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In Case 1, we also compare the three methods to classify the regions, com- 
puting the derivative of the satisfaction probability function by finite differences 
and (i) optimising it globally to obtain the Lipschitz constant (equal to 4.31), 
(ii) optimising it in every cell of the fine prediction grid to compute a local 
Lipschitz constant (in each cell). As for the heuristic method, we use 6% = 3 
instead of G5 = 1.96, and a grid step of order 1074, three orders of magnitude 
less than the lengthscale of the kernel, set by marginal likelihood optimization 
equal to 0.1. All three methods gave the same results for the grid size we used. 
More specifically, the maximum displacement of the approximated satisfaction 
probability inside the cell is estimated to be 0.003 

As astatistical accuracy test, we computed the “true” value of the satisfaction 
probability (by deep statistical model checking, using 10000 runs) for points in 
the positive and negative set close to the undefined set, and counted how many 
times these points were misclassified. More specifically, in Case 1 we consider 
300 equally-spaced points between 0.1 and 0.07 (consider that a portion of the 
undefined region is located in a neighborhood of 0.05, see Fig. 1). All points 
turned to be classified correctly, pointing out to the accuracy of the smMC 
prediction. 

We performed also a scalability test with respect to the size of the state 
space of the PCRN model, increasing the initial population size N of the SIR 
model (case 1). The results are reported in Table1 (right). We increase the 
initial population size maintaining the original proportion z = 5- Moreover 
we consider different thresholds œ and volume tolerance e€ in order to force the 
algorithm to execute at least one refinement step, as the shape of the satisfaction 
function changes with N. The execution time increase moderately, following a 
linear trend. 


Table 1. (LEFT) Results for the Statistical Parameter Synthesis for the SIR, model 
with N = 100 individuals and the formula ¢ = (I > 0)Uf100,120) (T = 0). We report 
the mean and standard deviation of the execution time of the algorithm. The volume 
tolerance is set to 10% and the threshold a is set to 0.1. The h-grid column shows the 
size h of the grid used to compute the positive, negative, and uncertain sets. (RIGHT) 
Scalability of the method w.r.t. the size of the state space of the SIR model, increasing 
initial population N. œ and 6 are the threshold and volume tolerance used in the 
experiments. 


Pop. Size} a | 6 | Time (sec) | 

200 = |0.13)10%]13, 05 + 3, 22 

[Case ki x kr h-grid__| Time (sec) 400 |0.08]10%|13, 86 + 5,99 
1 | (0.005, 0.3] x 0.05 0.0007 {17.92 42.61 800 | 0.2| 4% |15,02 + 0,05 

2 0.12 x [0.005, 0.2] 0.0005 4.87 + 0.01 1000 |0.23] 4% 17, 44 + 0,23 

3 |[0.005, 0.3] x [0.005, 0.2]](0.003,0.002)|116.4 + 4.06] | 2000 | 0.314% 28,81 +0.07 
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0.115 0.120 0125 0130 0135 0140 0145 0.150 
ki 


(a) Case 1 (b) Zoom of Case 1 


0.025 0.050 0075 0100 0.125 0150 0175 0.200 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 
ki 


k 


(c) Case 2 (d) Case 3 


Fig. 1. (a),(c) and (d) show the partition of the parameter space for Cases 1, 2, and 3 
respectively. The positive area Pa is depicted in red, the negative area Ma is in blue 
and the undefined region Ua is in yellow. (a) and (c) are one dimensional case: in the 
x-axis we report the parameter explored (respectively k; and kr), on the y-axis we show 
the value of the satisfaction function and the confidence bounds (for 6s = 3). The green 
horizontal line is the threshold a = 0.1 (d) shows a two dimensional parameter space, 
hence no confidence bound has been represented. The circle dot represent the training 
set. In (b) we have zoomed a portion of the parameter space of (a) to visualize the cells 
with base length equals to h and height equal to the span of the confidence bounds. 
(Color figure online) 


5 Conclusions 


We presented an efficient statistical algorithm for parameter synthesis, to iden- 
tify parameters satisfying MITL specifications with a probability greater than a 
certain threshold. The algorithm is based on Bayesian statistics and leverages the 
powerful parametric verification framework of Smoothed Model Checking, inte- 
grating it into an active learning refinement loop which drives the computational 
effort of simulations near the critical region concentrated around the threshold 
a. The developed approach shows good performance in terms of execution time 
and outperforms the exact algorithm developed in [7], retaining good accuracy 
at the price of having only statistical guarantees. 

Note that we compared with the performance of [7] and not of their GPU 
implementation [12], as our method uses only CPU computing power at the 
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moment. However, it can be implemented on a GPU, leveraging e.g. [31]. We 
expect a substantial increase of the performance. Fully distributing on CPU 
the computations of the algorithm, beyond only stochastic simulation, is also 
feasible, the hard part being to parallelise GP inference [32]. 

Other directions for future work include the implementation of the adaptive 
grid strategy to construct the Pa, Na, and Ua regions, given the output of the 
smMC, and a divide and conquer strategy to split the parameter space (and the 
uncertain set Ua) in subregions, to reduce the complexity of the smMC. These 
two extensions are mandatory to scale the method in higher dimensions, up to 
6-8 parameters. To scale even further, we plan to integrate techniques to speed 
up GP reconstruction: more classical sparsity approximation techniques [10] and 
more recent methods for GPs tailored to work on grids [33,34]. This techniques 
have a computational cost of O(n) instead of standard implementation which 
costs O(n?). Finally, we aim to combine our approach with the exact algorithm 
developed in [7]. The idea is to use our approach for a rough exploration of the 
parameter space to cut out the region with higher statistical confidence to be 
higher or lower than the considered threshold, applying the exact approach in 
the remain area, when feasible. 
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Abstract. 2LS is a C program analyser built upon the CPROVER 
infrastructure. 2LS is bit-precise and it can verify and refute program 
assertions and termination. 2LS implements template-based synthesis 
techniques, e.g. to find invariants and ranking functions, and incremen- 
tal loop unwinding techniques to find counterexamples and k-induction 
proofs. New features in this year’s version are improved handling of heap- 
allocated data structures using a template domain for shape analysis and 
two approaches to prove program non-termination. 


1 Overview 


2LS is a static analysis and verification tool for sequential C programs that 
is based on an algorithm called kIkI (k-invariants and k-induction) [1], which 
combines bounded model checking, k-induction, and abstract interpretation into 
a single, scalable framework. 2LS relies on incremental SAT solving to employ 
all these techniques simultaneously in order to find proofs and refutations of 
assertions, as well as to perform termination analysis [2]. 

This year’s competition version introduces a new abstract shape domain 
allowing 2LS to reason about properties of programs manipulating heap and 
dynamic data structures, and a non-termination analysis, which serves as a 
counterpart to the existing termination analysis and allows 2LS to prove non- 
termination of a program. 


Architecture. 2LS is built upon the CPROVER infrastructure [3] and thus uses 
GOTO programs as the internal program representation. It first performs vari- 
ous static analyses and transformations of the program, including resolution of 
function pointers, points-to analysis, and insertion of assertions guarding against 
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invalid pointer and memory operations. The analysed program is then translated 
into an acyclic, over-approximate single static assignment (SSA) form, in which 
loops are cut at the edges returning to the loop head. Subsequently, 2LS refines 
this over-approximation by computing inductive invariants in various abstract 
domains represented by parametrised logical formulae, so-called templates [1]. 
The competition version uses the interval domain for numerical variables and 
the new shape domain for pointer-typed variables described below. 

The KIKI algorithm [1] operates on the SSA form, which is translated into 
a CNF formula over a bitvector representation of program configurations and 
given to a SAT solver. This formula is incrementally extended and amended to 
perform loop unwindings and abstract domain operations. The model returned 
by the solver is then used either to refine the predicates representing abstract 
values or to find a counterexample refuting the property to be checked. A more 
detailed description of the 2LS architecture can be found in the tool paper [7]. 


2 New Features 


For SV-COMP’18, apart from various bug fixes and minor improvements, two 
major improvements of 2LS have been implemented: namely, a support for deal- 
ing with inductive list-like data structures and a support for proving program 
non-termination. Although 2LS supports certain interprocedural analyses, the 
competition version performs both analyses in a monolithic way, i.e. after inlin- 
ing function calls. These improvements tackle weaknesses observed in previous 
years in the heap and memory safety categories, as well as they give a boost to 
2LS’ capabilities in non-termination analysis. 


2.1 Memory Safety and Heap Invariants 


To support shape analysis of 145+ 


dynamic data structures, a new | 
abstract domain has been added to ikt nxt nxt 
2LS to express invariants describ- | 71 *| 02 ai >| 02 > NULL 


ing heap configurations in the con- 

text of the bitvector logic used Fig.1. A singly-linked list with nodes allo- 
by 2LS [4]. The domain is based cated at two different program locations. 

on recording (1) information about 

abstract heap objects pointed to by pointer variables and (2) information 
about reachability of abstract objects using pointer access paths [6]. Here, 
an abstract heap object represents all objects allocated at a given program 
location. The access paths then record which target abstract objects can be 
reached from a given source abstract object while going through some set 
of intermediary objects. For instance, the list in Fig.1 would be encoded as 
list = &o, A path(o,,nxt, {01,02}, NULL), meaning that list points to an 
object 0; and there is a path from o; via nxt fields of abstract objects 0; and o2 to 
NULL. This representation is integrated as a template over pointer-typed variables 
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and fields of dynamic objects into kIkI. The template is a parametrised logical 
formula. The parameters encode sets of memory objects that can be pointed by 
each pointer-typed variable as well as the set of paths that can lead from each 
dynamic object to other objects. 2LS computes these sets using an incremental 
SAT solver. This allows 2LS to prove or to refute assertions related to manipu- 
lation of dynamically linked data structures. The supported properties include 
null-pointer dereferencing, double-free, or memory leaks, for instance. Assertions 
for these properties are automatically instrumented into the code. 


2.2 Proving Non-termination 


Last year’s version of 2LS provided a technique for proving termination based on 
linear lexicographic ranking functions synthesised using templates over bitivec- 
tors [2], but the tool was unable to prove non-termination except for trivial 
cases. For SV-COMP’18, two techniques for proving non-termination have been 
added [5]. Both of the approaches are relatively simple, yet appear to be reason- 
ably efficient on the SV-COMP benchmarks. 

The first approach is based on finding singleton recurrence sets. All loops 
are unfolded k times (with k being incrementally increased), followed by a check 
whether there is some loop L and a program configuration that can be reached 
at the head of L after both k’ and k unwindings for some k’ < k. Such a 
check can be easily formulated in 2LS as a formula over the SSA representation 
of programs with loops unfolded k times. This technique is able to find lasso- 
shaped executions in which a loop returns to the same program configuration 
every k — k’ iterations after k’ initial iterations. 

The second approach tries to reduce the number of unwindings by looking 
for loops that generate an arithmetic progression over every integer variable. 
More precisely, it looks for loops L for which each integer variable x can be 
associated with a constant Cy such that every iteration of L changes the value 
of x to x + Cz, keeping non-integer variables unchanged. Two queries are used 
to detect such loops: the first one asks whether there is a configuration z and 
a constant vector ¢ (with the vectors ranging over all integer variables modified 
in the loop and constants from their associated bitvector domains) such that 
one iteration of L ends in the configuration % + g, while the second makes sure 
that there is no configuration %’ over which one iteration of L would terminate 
in a configuration other than z’ + ¢. If such a loop L and a constant vector ¢ 
are found, non-termination of L can be proved as follows: First, we gradually 
exclude each configuration % reachable at the head of L for which there is some k 
such that L cannot be executed from 7 + k.¢ (intuitively meaning that L cannot 
be executed k + 1 times from %). Second, we check whether there remains some 
non-excluded configuration reachable at the head of L. 

The termination and non-termination analyses are run in parallel, and the 
first definite answer is used. Among the new non-termination analyses, several 
rounds of unwinding are first tried with the singleton recurrence set approach. If 
that is not sufficient, the arithmetic progression approach is tried. If that does not 
succeed either, further rounds of unwinding with the former approach are run. 
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3 Strengths and Weaknesses 


2LS’ core algorithm, KIKI, is designed to be efficient for simultaneously finding 
proofs as well as refutations. Our SSA encoding allows us to introduce abstrac- 
tions only at certain program points where these are necessary to infer the pred- 
icates required to construct proofs (e.g. invariants, ranking functions, recurrence 
sets). The remaining program is represented in a bit-precise large-block encoding. 

Compared to the previous editions of the competition, 2LS is now able to 
reason about dynamic linked data structures. The approach used is currently 
able to handle various forms of linked lists (singly- or doubly-linked, a subset 
of nested or circular lists). However, more elaborate template domains will be 
required to handle other dynamic data structures such as trees and more general 
graph structures. 

2LS’ template-based approach to abstract interpretation allows easy combi- 
nation of domains. We combine the heap domain with intervals over bitvectors, 
which is sufficient for many benchmarks. However, some benchmarks, e.g. those 
requiring reasoning about arrays contents, demand stronger invariants than we 
are currently able to infer. 

The termination analysis scales well, but is currently limited to rather sim- 
ple termination conditions (lexicographic linear). The newly implemented non- 
termination analyses are surprisingly effective on many SV-COMP termination 
benchmarks (638 out of 657 non-termination benchmarks proved). However, if 
a larger number of unwindings is needed the approach becomes quite inefficient. 
kIkI does not yet support recursion, which is another limitation, in particular 
w.r.t. the SV-COMP termination benchmark set, which contains a large number 
of recursive programs. The output of witnesses in the new categories (memory 
safety and termination) is still lacking (more than 550 points have been lost 
there). 


4 Tool Setup 


The competition submission is based on 2LS version 0.6.' Installation instruc- 
tions are given in the file COMPILING. The executable 21s is in the directory 
src/21s. See the 21s wrapper script (contained in the tarball) for the relevant 
command line options given to 2LS. The BenchExec script is called two_ls.py 
and the benchmark definition file 21s.xml. As a back end, the competition 
submission of 2LS uses Glucose 4.0. 2LS competes in all categories except 
Concurrency. 


5 Software Project 


2LS is maintained by Peter Schrammel with pull requests contributed by the 
community. It is publicly available under a BSD-style license. The source code 
is available at http://www.github.com/diffblue/2ls. 


' Executable available at https: //gitlab.com/sosy-lab/sv-comp/archives/tags/svco 
mplg. 
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Abstract. This paper presents the YOGAR-CBMC tool for verification 
of multi-threaded C programs. It employs a scheduling constraint based 
abstraction refinement method for bounded model checking of concur- 
rent programs. To obtain effective refinement constraints, we have pro- 
posed the notion of Event Order Graph (EOG), and have devised two 
graph-based algorithms over EOG for counterexample validation and 
refinement generation. The experiments in SV-COMP 2017 show the 
promising results of our tool. 


1 Verification Approach and Software Architecture 


Bounded model checking (BMC) is among the most efficient techniques for con- 
current program verification [1]. However, due to non-deterministic interleavings, 
a huge encoding is required for an exact description of the thread interaction. 

YOGAR-CBMC is a verification tool for multi-threaded C programs based 
on shared variables under sequential consistency (SC). For these programs, we 
have observed that the scheduling constraint, which defines that “for any pair 
(w,r) s.t. r reads the value of a variable v written by w, there should be no 
other write of v between them”, significantly contributes to the complexity of 
the behavior encoding. In the existing work of BMC, the scheduling constraint 
is encoded into a complicated logic formula, the size of which is cubic in the 
number of shared memory accesses [2]. 

To avoid the huge encoding of scheduling constraint, YOGAR-CBMC per- 
forms abstraction refinement by weakening and strengthening the scheduling 
constraint [3]. Figure 1 demonstrates the high-level overview of its architec- 
ture. We initially ignore the scheduling constraint and then obtain an over- 
approximation abstraction yp of the original program (w.r.t. the given loop 
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unwinding depth). If the property is safe on the abstraction, then it also holds 
on the original bounded program. Otherwise, an abstraction counterexample is 
obtained and the abstraction will be refined if the counterexample is infeasible. 


Refinement Constraint 


Over-approximation: 
Ignore the scheduling constraint 


\ 


Abstraction % 


Constraint-based 
Refinement 


Graph-based 
Refinement 


[Infeasible] 


[Infeasible] 


6 [Unsafe]/ Nois 
g Error states ferf Abstraction |Counterexample Graph-based [Not Sure] f Constraint-based 
Model EOG Validation EOG Validation 


Multi-Threaded 
C Program 


[Feasible] 


True 
Counterexample 


Proof 


Fig. 1. High-level overview of YOGAR-CBMC architecture. 


The performance of this method significantly depends on the generated 
refinement constraints. Ideally, a refinement constraint should have a small size 
yet a large amount of space should be reduced during each iteration. To achieve 
this goal, we have proposed the notion of Event Order Graph (EOG), and have 
devised two graph-based algorithms over EOG for counterexample validation 
and refinement generation. Given an abstraction counterexample 7, the corre- 
sponding EOG G, captures all the event order requirements of m defined in 
the scheduling constraint. The counterexample 7 is feasible iff the EOG Gy is 
feasible. To validate the feasibility of Gr, we have proposed several deduction 
rules to deduce those implicit order requirements of G,. If any cycle exists in 
Gz, then both m and G, are infeasible. A graph-based refinement algorithm 
is then employed to analyze all the possible “kernel reasons” of all cycles. By 
eliminating those “redundant” kernel reasons, we can usually obtain a small 
set of “core kernel reasons”, which can usually be encoded into a small refine- 
ment constraint. The experimental results show that: (1) Our graph-based EOG 
validation method is powerful enough in practice. Given an infeasible EOG, it 
can usually identify the infeasibility with rare exceptions. (2) Our graph-based 
refinement method is effective. If some cycle exists in Gr, it can usually obtain 
a small refinement constraint which reduces a large amount of search space. 

If no cycle exists in Gr, we are not sure whether the EOG is feasible or not. 
We employ a constraint-based EOG validation process to further validate its 
feasibility by constraint solving. If an infeasibility is determined, a constraint- 
based refinement generation process is performed to refine the abstraction, 
which obtains only one kernel reason of the infeasibility. Enhanced by these 
two constraint-based processes, we have proved that our method is sound and 
complete w.r.t the given loop unwinding depth. 
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Consider the example shown in Fig. 2. We attempt to verify that it is impos- 
sible for both m and n to be 1 after the exit of threads thr1 and thr2, which 
has a modular proof in this program. In this example, we have observed that: 


(1) 


clauses 
and 


Excluding the 3049 CNF 
encoding the pthread_create 


intx=1, y=1,m=0,n=0; 
void* thr1(void * arg) { 


pthread_join functions, the encodings x=y+1; 
of this program with and without the m=y; 
scheduling constraint have 10214 and x=0; 
1018 CNF clauses, respectively. It indi- |} 
cates that the scheduling constraint sig- | void* thr2(void * arg) { 
nificantly contributes the complexity of y=x+1; 
the program encoding. wE 
(2) During the verification, all the abstrac- y=0; 
tion counterexamples are infeasible. All } . . 
void main() { 


of them have been identified to be infea- 
sible by our graph-based EOG vali- 
dation method. It indicates that our 
graph-based EOG validation method is 
powerful enough in practice. 

The property is verified through only 
three refinements, and only 7 simple |} 
CNF clauses are added during the 
refinement processes. It indicates that 
the refinement constraints usually have 
small sizes yet reduce large amount of the search space, and our graph-based 
refinement method is effective. 


pthread _t t1, t2; 
pthread_create(&t1, O, thr1, 0); 
pthread_create(&t2, 0, thr2, 0); 
pthread_join(t1, 0); 
pthread_join(t2, 0); 

assert (!(m == 1 && n == 1)); 


Fig. 2. An illustration example. 


2 Strengths and Weaknesses 


The strengths of our tool include: (1) Our approach is a general purpose tech- 
nique for multi-threaded C program verification, not assuming any special char- 
acteristics of the programs. Our tool supports nearly all features of C and 
PThreads. (2) Our approach is efficient in practice. Without the scheduling con- 
straint, the size of the encoding can be dramatically reduced. Moreover, it can 
usually verify the property with a small number of refinements, while the refine- 
ment constraints usually have small sizes. (3) Enhanced by the constraint-based 
counterexample validation and refinement generation processes, our approach 
is sound and complete w.r.t. the given loop unwinding depth. It provides both 
proofs and refutations for the property. If the property is found to be false, 
a counterexample will be provided. (4) As the abstractions usually have small 
sizes, our tool generally consumes less memory than those tools giving an exact 
description of the scheduling constraint. In this sense, our tool is more scalable. 

We have applied YOGAR-CBMC to the benchmarks in the concurrency track 
of SV-COMP 2017. Our tool has successfully verified all these examples within 


YOGAR-CBMC: CBMC with Scheduling Constraint 425 


1550s and 43GB of memory. It has won the gold medal in the Concurrency 
Safety category of SV-COMP 2017 [4]. 

However, for those programs where the scheduling constraint is not the major 
part of the encoding, our method may still need dozens of refinements. Given 
that the abstractions may have similar size with the monolithic encoding, our 
tool may run worse than those monolithic encoding tools. Moreover, for those 
real-world programs with a large number of read/write accesses and complex 
data structures, how to reduce the number of refinements and how to deal with 
the shared structure members more efficiently, are still challenging problems. 


3 Tool Setup and Configuration 


The binary file of YoGAR-CBMC for Ubuntu 16.04 (x86_64-linux) is available 
at https://gitlab.com/sosy-lab/sv-comp/archives. It is implemented on top of 
CBMC-4.9!. Its setup and configuration are same as that of CBMC. The tool- 
info module and benchmark definition of our tool is “yogar-cbmc.py” and “yogar- 
cbmc.xml” respectively. 

Our tool needs two parameters of CBMC: --no-unwinding-assertions 
and --32. The unwind bound of YOGAR-CBMC is dynamically determined 
through a syntax analysis. Particularly, the bound is set to 2 for programs with 
arrays, and n if some of the program’s for loops are upper bounded by a constant 
n, which is the same as for MU-CSkEq [5]. To run YOGAR-CBMC for a program 
(file), just use the following command: 


./yogar-cbmc --no-unwinding-assertions --32 (file) 


Participation/Opt Out. YOGAR-CBMC competes only in the concurrency 
category. 


4 Software Project and Contributors 


YOGAR-CBMC is developed at HPCL, School of Computers, National Univer- 
sity of Defense Technology, and includes contributions by the authors of this 
paper. Its source code is available at https: //github.com/yinliangze/yogar-cbmc. 
For more information, contact Liangze Yin. 
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Abstract. Our submission to SV-COMP’18 is a composite tool based on 
software verification framework CPACHECKER and static analysis plat- 
form FRAMA-C. The base verifier uses a combination of predicate and 
explicit value analysis with block-abstraction memoization as the CPA- 
BAM-BnB tool presented at SV-COMP’17. In this submission we aug- 
ment the verifier on reachability verification tasks with a slicer that is 
able to remove those statements that are irrelevant to the reachability of 
error locations in the analysed program. The slicer is based on context- 
sensitive flow-insensitive separation analysis with typed polymorphic 
regions and simple dependency analysis with transitive closures. The 
resulting analysis preserves reachability modulo possible non-termination 
while removing enough irrelevant code to achieve considerable speedup 
of the main analysis. The slicer is implemented as a FRAMA-C plugin. 


1 Verification Approach 


The submission presents a composite setting comprised of a mature static verifi- 
cation tool CPACHECKER [1] and an experimental reachability slicer (a FRAMA- 
C [2] plugin) intended to speed up verification by pruning the verification scope 
prior the application of the main analysis. By verification scope we understand 
the code to be analyzed rather than the search space explored by the main anal- 
ysis since the slicer doesn’t prune the search space as it is, but rather removes 
statements (including function calls) that can be proved to not influence the 
verification outcome. The slicer included in this submission is currently only 
applicable to reachability verification tasks, though the underline algorithm is 
not generally limited to reachability of a small number of error locations and so 
can be potentially extended to support e.g. memory safety properties. 

The slicer is based on a relatively simple mark-and-sweep algorithm, where 
the relevant statements are first identified by computing transitive closure of 
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the dependency relation, then marked, and finally the remaining statements are 
removed to produce a sliced verification task. The mark-and-sweep slicing is per- 
formed on top of preliminary region analysis, which allows to handle abstract 
memory locations ascribed to the corresponding disjoint memory regions essen- 
tially similar to usual unaliased program variables. 

The region analysis implemented in the current submission is a conserva- 
tive over-approximation of context-sensitive flow-insensitive separation analy- 
sis with polymorphic regions for deductive verification. It was first described 
in [3] and later substantially extended in [4]. The conservative approximation 
is needed because the original analysis generally requires user annotations. The 
over-approximation is expressed in the form of additional dependencies intro- 
duced on the marking stage rather than in the region analysis itself. The depen- 
dencies allow to approximate reinterpretations of memory regions (corresponding 
to the use of unions and arbitrary pointer type casts), but not some corner cases 
of pointer arithmetic (mostly arithmetic dependent on a particular layout of 
structure fields), so the resulting analysis remains unsound in the general case. 
However, the results of analysis benchmarking using CPACHECKER as reacha- 
bility verifier on the tasks in SV-COMP SoftwareSystems category showed no 
cases of unsoundness caused by the region analysis. This may be explained by 
the fact that most of the cases where the analysis is unsound with respect to 
a low-level C memory model are also regarded as undefined behavior by the C 
standard, so are probably quite rarely used in practice. 


2 Software Architecture 


The main CPACHECKER verification framework is included in the submission 
without any considerable changes. The combined tool is implemented as a wrap- 
per script that encapsulates the main verifier invocation and does the following: 


— extracts the property specification and verification task from the arguments; 

— runs the slicer with timeout of 400s (the sliced program is written to an 
intermediate C file); 

— runs CPACHECKER configuration 1dv-bam-svcomp on the sliced program; 

— post-processes the witness produced by CPACHECKER. 


The slicer (named CRUDE_SLICER) is implemented as a plugin to FRAMA-C [2], 
an extensible platform for source-code analysis of C software. The plugin imple- 
mentation does not interact with other FRAMA-C plugins and only makes use of the 
FRAMA-C kernel. The plugin also uses OCAMLGRAPH [5] library. Both the FRAMA- 
C platform and the CRUDE_SLICER plugin are implemented in OCaml. 

The witness post-processing stage currently simply removes the character 
offsets from the resulting witness (the line numbers are preserved using line 
directives supported by CPACHECKER) and substitutes checksum of the original 
program source. 

Since the SoftwareSystems category of the competition also contains mem- 
ory safety (and overflow) verification tasks, the submission also includes memory 
safety configuration smg-ldv based on shape analysis presented in [6]. 
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3 Evaluation of the Approach 


The slicer is currently able to handle only reachability verification tasks. 
It was evaluated on 2734 tasks from the Systems DeviceDriversLinux64_ 
ReachSafety subcategory of the SV-COMP’18 benchmarks on Intel Xeon E3- 
1230 v5 (3.4GHz) machines in the competition setting. The submitted configu- 
ration with slicing was compared to baseline CPA-BAM-BnB [7,8] configuration 
(-ldv-bam-svcomp) without slicing that was also submitted to this year’s com- 
petition. The results are presented in the following table: 


TRUE verdicts FALSE verdicts Speedup 
New (+) | Lost (—) | Total New (+) | Lost (—) | Total] Min | Max Average 
151 10 2252 | 97 11 267 |0.03 x | 18.59 x | 1.17 x 


The table presents the results for correct verdicts only and does not take 
witness checking into account. 

There are two significant limitations of the approach. First, the slicing is 
performed under assumption that all possible execution paths in the verified 
program are finite. This does not lead to unsoundness, since reachability (as a 
safety property) can be assumed to be violated only on finite paths. However, 
there is 3 wrong FALSE verdicts reported on the benchmarks where an error loca- 
tion is spuriously reached after passing through an infinite loop removed by the 
slicer. Another limitation is that the resulting tool can not produce precise wit- 
nesses both due to imprecision in source code locations and (more importantly) 
due to unavailability of either invariants or error paths in the sliced out parts 
of the code. The caused 1090 TRUE verdicts and all FALSE verdicts to fail to be 
confirmed by the witness checkers on the competition. 

The time required for slicing varies from 0.08 to 1905.47s with an average 
of 14.82s. So in the submission the slicer is run with a timeout of 400s and the 
remaining tasks (17 out of 2734 in the evaluation) are passed to the main verifier 
without slicing. 


4 Tool Setup and Configuration 


The submission is available for download as a ZIP archive named cpa-bam- 
slicing.zip from the SV-COMP repository by following URL: https://gitlab. 
com/sosy-lab/sv-comp/archives/tree/master/2018. The submission includes 
CPACHECKER version 1.6.1 and a statically linked version of FRAMA- 
C Sulfur-20171101-beta with CRUDE_SLICER plugin. The version of the plugin 
corresponds to commit fcd3b927. CPACHECKER requires Java 8 runtime envi- 
ronment. The invocation of the slicer is embedded in the CPACHECKER wrapper 
script, so the whole tool has to be executed with the following command line: 
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scripts/cpa.sh -ldv-bam-svcomp -disable-java-assertions 
-heap 10000m -spec prop.prp program.c 
The tool participates in SoftwareSystems category, the corresponding 
benchmark definition is cpa-bam-slicing.xml. 
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Abstract. InterpChecker is a tool for verifying safety properties of C 
programs. It reduces the state space of programs throughout the verifi- 
cation via two new kinds of interpolations and associated optimization 
strategies. The implementation builds on the open-source, configurable 
software verification tool, CPAChecker. 


1 Verification Approach 


Our approach to scalable CEGAR-based model checking is to exploit Craig inter- 
polation [3] to learn abstractions that can systematically reduce the program 
state space which must be explored for a given safety verification problem. In 
addition to the interpolants for parsimonious abstraction [4] (called reachability 
interpolants (R-Interp) here for clarity), we introduce two new kinds of inter- 
polants, called universal safety interpolants and existential error interpolants. 


— A universal safety interpolant (or S-Interp) is useful for determining whether 
all the paths emanating from a state are safe, without exploring all the pos- 
sible branches from it. 

— An existential error interpolant (or E-Interp) is useful for determining 
whether there exists an unsafe path emanating from a state, without exploring 
all the possible branches from it. 


The S-Interp at a location of a control flow graph (CFG) collects predicates that 
are relevant to a yes-instance of the safety verification, so that whenever the S- 
Interp is implied by the current path, all paths emanating from this location are 
guaranteed to be safe. Dually, whenever the E-Interp at a location of a CFG 
is implied by the current path, there is an unsafe branch from it, and so, one 
can immediately conclude that the program is unsafe. We learn S-Interp and 
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E-Interp from spurious error traces and apply them to reduce the state space 
of programs throughout the CEGAR-based program verification process. For 
convenience, we denote a CFG as a tuple G = (L,T, lo, f), where L is the set 
of program locations, lo € L is the initial location, f € L is the final location, 
T C L x Ops x L is the transition relation, and Ops is the set of instructions. 

When verifying a programs, we first unwind the CFG to generate an Abstract 
Reachability Tree (ART). An ART A = (S4, Ea), obtained from a CFG G = 
(L,T, lo, f), consists of a set S4 of abstract states and a set E4 of edges. An 
abstract state s € S4 is a triple s = (l,c, p) where l is a location in the CFG, c 
is the current call stack, and p is an abstract predicate indicating the reachable 
region of the current state which is determined by the reachable interpolant, 
R-Interp. Given two states s and s’, we say s is covered by s’ just if s[0] = s’[0], 
s[1] = s'[1], and s[2] — s’[2]. (Notation: for a tuple e, we write efi] for the 
i-th component of e.) Further, if s is covered by s’ and all the future of s’ 
(i.e. all abstract states reachable from s’) has been explored, then it is safe to 
not explore the future of s. A branch (path) H of an ART, denoting a possible 
execution of the program, is a finite alternating sequence of states and edges, 
IT = (80, €0,°** ,€n—1; Sn), Such that for all 0 < i < n, e;[0] = s; and e;[2] = s;44. 
Given a path J of an ART, we write P;(IT) for the path formula ssa(eo[1]) A 
++» ASSA(€n—1[1]) obtained from JM. Here ssA(op) is the static single assignment 
(SSA) of an operation op where every variable occurring in J is assigned a value 
at most once. 

Given a CFG whose locations are enriched with default values of R-Interp, S- 
Interp, and E-Interp, we construct the ART for exploring a real counterexample 
by starting from the root, i.e. so : (lo, —, true). The flowchart in Fig. 1 gives a 
bird’s eye view of our approach to safety verification with reachability, safety 
and error interpolations. When a state s : (I,c,p) is being explored and J is not 
an error location: 


(1) Reversely traverse the current path for other possibilities if one of the fol- 
lowing three conditions holds: 
e p= false; 
e pF false, F(l) = f, and Py(so,--- ,s) > I(l); 
e p# false and s is covered by a visited state s’. 
(2) Report the program is unsafe, if p # false and Pf(so,--- ,s) > E-Interp(l). 
(3) Explore the succeeding state s” : (suc(l),c,p’), otherwise. 


When J of the current state s : (l, c, p) is an error location, we first check whether 
the current path IT = (so,--- ,s) is spurious. If J is not spurious, we conclude 
that the program is unsafe. Otherwise, by update S-Interp, update E-Interp, and 
update R-Interp [5], the S-Interp, E-Interp, and R-Interp of locations involved 
in IT are updated, respectively. Subsequently, we reversely track the current 
path for other possibilities and treat a new current state s : (l, c, p) in the same 
way until the program is reported as unsafe or there are no more states to be 
explored. 

To maximise the effect of the proposed interpolations, we also present two kinds 
of optimizing strategies: pruning CFG and weight-gquided search. In real-world 
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Fig. 1. Interpolation aided CEGAR approach for program verification 
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programs, there may exist some locations in a CFG which can never reach any error 
location. To avoid exploring these locations when verifying the program, the first 
strategy is to prune the CFG by removing these locations and the relative control 
flow edges in advance. A safety interpolant works only when it is full. Hence, the 
earlier full safety-interpolants are formed, the more effective the performance will 
be. To form full interpolants, the second strategy is to explore one side of a branch 
as early as possible if the other side has been explored. The goal is achieved by 
introducing an attribute weight to transitions of a CFG. Throughout the verifica- 
tion, the branch with the largest weight will be explored first. 


2 Software Architecture 


Our implementation of InterpChecker builds on the open-source, configurable 
software verification tool, CPAChecker [1]. Like CPAChecker, InterpChecker can 
verify safety properties of C program via reachability checking of the instru- 
mented error labels. All extra functions are implemented in Java, using the 
existing libraries provided by CPAChecker. In Fig. 1, the white parts are new, 
while the grey parts are original CPAChecker functions. We set up the Inter- 
pChecker interpolants and optimizations as an option of CPAChecker, organised 
as a refinement-selection configuration, in the sense of [2]. 


3 Strengths and Weakness 


The new interpolants implemented in InterpChecker do not affect the existing 
configurations of CPAChecker. InterpChecker supports the verification of safety 
properties of C program via reachability checking of the instrumented error 
labels. The power of InterpChecker is best illustrated when analysing large-scale 
programs because it can avoid exploring more paths. The current version does 
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not support the verification of the properties written as temporal logic formulas. 
Like CPAChecker, we skip recursive functions and treat them as pure functions. 
Thus, false negatives may occur for programs with recursive functions. 


4 Tool Setup and Configuration 


A zipped file containing InterpChecker 1.0 is available at http://github.com/ 
duanzhao-dz/interpchecker. It contains all the required libraries: no installation 
of external tools is required. To run InterpChecker, first download the code from 
the website, then run the following command to install the package in Ubuntu 
16.04: sudo apt-get install openjdk-8-jdk. 

To process a benchmark example test.c, invoke the script by the follow- 
ing command: ./scripts/cpa.sh -sv-comp18-interpcpachecker test.c. The out- 
put of InterpChecker is written to the file output/Statistics.txt. When using 
BenchExec, the output can be translated by the interpchecker.py tool-info 
module. The categories verified by the competition candidate are listed in the 
file interpchecker.xml. The two files are contained in the zipped file. If the 
checked property does not hold, a human readable counterexample is writ- 
ten to output/ErrorPath.txt and an error witness is written to the zipped file 
witness.graphml.gz. Note that Java Runtime Environment is required, which 
should be at least Java 8 compatible. 


5 Software Project and Contributors 


Based on the open source tool CPAChecker, InterpChecker is developed by 
Xidian University, China, and the University of Oxford, UK. We thank Dirk 
Beyer and his team for their original contributions to CPAChecker. 
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Abstract. Map2Check is a bug hunting tool that automatically checks 
safety properties in C programs. It tracks memory pointers and vari- 
able assignments to check user-specified assertions, overflow, and pointer 
safety. Here, we extend Map2Check to: (i) simplify the program using 
Clang/LLVM;; (ii) perform a path-based symbolic execution using the 
KLEE tool; and (iii) transform and instrument the code using the 
LLVM dynamic information flow. The SVCOMP’18 results show that 
Map2Check can be effective in generating and checking test cases related 
to memory management of C programs. 


1 Overview 


Map2Check v7.1 uses source code instrumentation based on dynamic information 
flow, to monitor data from different program executions. Map2Check automati- 
cally produces concrete inputs to the program via symbolic execution, in order to 
execute different program paths and to detect failures related to arithmetic over- 
flow, invalid deallocation, invalid pointers, and memory leaks. Map2Check uses 
Clang [5] as a front-end, which supports the main C standard, e.g., C99 according 
to the standard ISO/IEC 9899:1990. In its previous version [7], Map2Check was 
able to automatically generate test cases to check memory management using 
bounded model checkers (e.g., ESBMC [4]). The main original contributions of 
Map2Check v7.1 are: (i) added Clang [5] as a front-end to improve the symbolic 
execution of C programs; (ii) adopted the LLVM [6] framework as a code trans- 
formation engine; and (iii) integrated the KLEE [1] tool as a symbolic execution 
engine to automatically explore different program paths. 


2 Verification Approach 


The Map2Check tool is inspired by LEAKPOINT [3] and Symbiotic 4 [2], which 
use compiler techniques to analyze C programs using code instrumentation. The 
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main novelty of Map2Check v7.1 is the integration of the LLVM Intermediate 
Representation (IR) to analyze and verify C programs. This LLVM IR is based 
on the static single assignment representation and provides type safety, low- 
level operations, and the capability of representing high-level languages. If we 
compare Map2Check to other related tools, e.g., Symbiotic 4, it does not per- 
form static program slicing and does not use the symbolic execution of KLEE to 
directly explore the program state space. Map2Check applies source code instru- 
mentation to monitor and gather areas of data memory from different concrete 
program executions; this code instrumentation focuses on exploring dynamic 
information flow to avoid the need for an approximate static analysis. Similarly 
to LEAKPOINT, Map2Check taints program data (e.g., variables or memory 
locations) with a taint mark metadata and then propagates the taint marks over 
the concrete program executions. Fig. 1 shows an overview of the Map2Check 
verification flow. The tool input is a C program and a safety property (e.g., over- 
flow and pointer safety); it returns TRUE (if there is no path that violates the 
safety property), FALSE (if there exists a path that violates the safety property), 
or UNKNOWN otherwise. 


Map2Check Library Symbolic Execution 


c] g — Address Points to |Is Dynamic| Free |Line 
> 8 = a Ox7fff32a334e8 |0x7b9010 1 o | 5 
of 01 0x7fff32a334e0 |0x7b9080 1 0 6 
cs i] ono a 3 Ox7fff32a334e0 |0x7b9010 1 o |7 
2 åA of} a o} Ox7fff32a334e8 [Ox7b9010| 0 1|8 
h Ox7fff32a334e0 [Ox7b9010 0 ifs| g 
a i 
[D] 
%A = alloca i32*, align 8 %A = alloca i32*, align 8 
%B = alloca i32*, align 8 call void @map2check_alloca(...) #10 0-0 
$B = alloca i32*, align 8 /* 
call void @map2check_alloca(...) #10, !dbg 
5. int *A = malloc(10); s 6-0 
» int *B = malloc(10); call void @map2check free(...), dbg 123 Violation 
%9 = call ... @free to i32 (i32*, ...)*) (132* %8), !dbg Witness 
call void @map2check_success(), !dbg Finish 


ret i32 0, !dbg !24 


Verification 
Start Verification 


Fig. 1. Map2Check verification flow. 


The Map2Check verification flow has the following main steps: (A) convert 
the C code into the LLVM IR using Clang [5]; (B) apply specific code optimiza- 
tions, e.g., dead code elimination and constant propagation; (C) add Map2Check 
library functions to track pointers, and add assertions into the LLVM bitcode; 
(D) connect the code instrumented by Map2Check to support the execution of 
its functions; (E) apply further Clang optimizations to improve the symbolic 
execution (e.g., canonicalize natural loops and promote memory to register); 
(F) generate concrete inputs for the Map2Check instrumented functions by per- 
forming symbolic execution of the analyzed code in LLVM IR using KLEE; and 
(G) generate witnesses: if a safety property is violated, then a “violation witness” 
is produced using the KLEE output to trace the error location; if there is no 
path that violates the safety property, then a “correctness witness” is produced, 


Map2Check Using LLVM and KLEE 439 


which identifies each basic block executed in the control flow graph of the LLVM 
IR using the concrete inputs produced by KLEE (LLVM syntactically enforces 
some of those basic blocks as invariants from its assignments). 

Map2Check v7.1 tracks important data of the analyzed C code to identify 
functions and operations over pointers. Then, it checks the respective assertions 
via symbolic execution, which produces inputs to concretely execute the pro- 
gram. In particular, Map2Check tracks the heap memory used by the analyzed 
code using the following data log lists: Heap log tracks the allocated memory 
address (i.e., arguments of functions, functions, and variables) and its memory 
size in the heap memory; Malloc log tracks the addresses that are dynamically 
allocated/deallocated, their size and pointer actions (allocation and dealloca- 
tion), executed at the current program location; and List log stores data about 
operations over pointers, e.g., the code line number for each operation, program 
scope, variable name, memory addresses, and addresses pointed to by program 
variables. 

Map2Check v7.1 implements a function map2check non_det_x with x in the 
supported C data types (e.g., char, int, and float), which is interpreted by KLEE 
to model non-deterministic values. In this respect, Map2Check v7.1 differs from 
its previous version, which implements for non-deterministic values, a function 
that returns a random number based on a probabilistic distribution. To check the 
unreachability of an error location, Map2Check identifies a given target function 
(e.g., -VERIFIER error) and then replaces that by an error assertion, where the 
target function is called. To check overflow, Map2Check adds an assertion before 
all arithmetic instructions over integers to analyze the results over the signed 
operations and the maximum and minimum integer values. To check pointer 
safety, Map2Check checks whether a given address to be deallocated is tracked 
in the Malloc log list and then identifies whether the deallocation of memory was 
already performed for that program location (invalid deallocation); Map2Check 
also identifies whether allocated memory was not released at the end of the pro- 
gram execution (memory leak); Additionally, Map2Check analyzes the memory 
addresses in the Malloc log and Heap log lists to identify if those addresses point 
to a valid address (invalid pointer). Map2Check does not distinguish between 
the usual “valid-memtrack” and “valid-memclean” properties in SV-COMP. 


3 Proposed Architecture 


Map2Check v7.1 is implemented as a source-to-source transformation tool in 
C/C++ using LLVM (v3.8.1). It uses Clang (v3.8.1) as a front-end to parse 
a C program and to generate the respective LLVM bitcode to be used in the 
code transformation to track pointers to areas of memory and variable assign- 
ments (Fig. 2). It uses KLEE (v1.2.0) as a path-based symbolic execution engine; 
STP! (v2.1.2) is used as the SMT solver by KLEE to check constraints over 
bit-vectors and arrays. The Boost? C++ library is used as a helper library, 
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Fig. 2. Map2Check architecture flow. 


e.g., to generate the witness in the GraphML format. Map2Check participates 
in SVCOMP’18 (as in the map2check.xml benchmark definition) in the follow- 
ing categories: ReachSafety-Arrays, ReachSafety-Bit Vectors, ReachSafety-Heap, 
ReachSafety-Loops, ReachSafety-Recursive, MemSafety, and NoOverflows. 


3.1 Availability and Installation 


Map2Check v7.1 (for 64-bit Linux) is available? under the GPL license. The 
Clang, LLVM, KLEE, and STP tools are included in the Map2Check distribu- 
tion. Map2Check is invoked via a command-line (as in the map2check.py module 
for BenchExec) as: 


./map2check-wrapper.py -p propertyFile.prp file.i 


Map2Check accepts the property file and the verification task and provides as 
result: TRUE + Witness, FALSE + Witness, or UNKNOWN. For each error- 
path or correctness witness, a file (called witness.graphml) with the witness 
proof is generated in the Map2Check root-path folder. 


4 Strengths and Weaknesses of the Approach 


Map2Check exploits dynamic information flow by tainting program data. It uses 
Clang/LLVM as an industrial-strength compiler to simplify and instrument the 
code; and also employs KLEE to produce concrete inputs for different program 
executions. The integration between LLVM and KLEE opens up several possibil- 
ities to implement new testing and verification techniques in Map2Check. Par- 
ticularly, we intend to improve our symbolic execution by synthesizing inductive 
invariants to prove properties of loops and recursive programs and also to prune 
the search-space, given that Map2Check bounds the loops and recursion up to a 
given depth k. The SVCOMP’18 results show that Map2Check can be effective 
in generating and checking test cases of memory management for C programs. 
Map2Check achieved a score of 228 in the MemSafety category with no single 


3 https: //github.com/hbgit /Map2Check/archive/map2check_v7.1-svcomp18d.zip. 
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incorrect result; in particular, Map2Check produced the highest score (i.e., 106) 
in the MemSafety-Arrays subcategory. In the NoOverflows category, Map2Check 
achieved a score of —263; some incorrect results are due to our imprecise over- 
flow check. In the ReachSafety category, we noted that Map2Check claims 312 
correct results; however, it reported 16 incorrect true and 1 incorrect false. Some 
of these incorrect results are related to Map2Check limitation to handle loops 
and recursion. 
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Abstract. The fifth version of SYMBIOTIC significantly improves instru- 
mentation capabilities that the tool uses to participate in the category 
MemSafety. It leverages an extended pointer analysis re-designed for 
instrumenting programs with memory safety errors, and staged instru- 
mentation reducing the number of inserted function calls that track or 
check the memory state. Apart from various bugfixes, we have ported 
SYMBIOTIC (including the external symbolic executor KLEE) to LLVM 3.9 
and improved the generation of violation witnesses by providing values 
of some variables. 


1 Verification Approach 


The basic approach of SYMBIOTIC remains unchanged [7]: it uses instrumenta- 
tion to reduce checking of specific properties (e.g. no-overflow or memory safety) 
to checking reachability of error locations. Then we apply slicing which removes 
the code that has no influence on reachability of these locations. Finally, we 
symbolically execute the sliced code using KLEE [1] to refute or confirm that an 
error location is reachable. 

For many years, our attention has been focused mainly on slicing [2,6,8]. Only 
in 2016, we implemented a configurable instrumentation that enabled SyMBI- 
OTIC to check memory safety or, in general, any safety property. Consequently, 
SYMBIOTIC 4 [4] participated for the first time in the category MemSafety where 
it won the bronze medal. 

The instrumentation used in SYMBIOTIC 4 to check memory safety inserts 
calls to functions that track every block of allocated memory and calls to func- 
tions that check validity of dereferences using the tracked information. A check is 
not inserted if a static pointer analysis guarantees that the dereferenced pointer 
points to a memory block that was allocated before. Later we have recognized a 
flaw of this optimization: a standard pointer analysis ignores memory dealloca- 
tions and, hence, it can tell that a pointer can point to memory blocks allocated 
by specific program lines, but it does not tell whether these memory blocks are 
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Fig. 1. Quantile plot of running times of the three considered configurations of SYM- 


BIOTIC 5. On the x-axis are the benchmarks sorted according to the corresponding 
running times and on the logarithmic y-axis are the times. 


still allocated. As a result, SYMBIOTIC 4 sometimes does not insert a check even 
if the dereference may be invalid and thus it may miss some bugs. 

In SYMBIOTIC 5, we have fixed and significantly boosted the instrumentation 
part. First, we have extended the above mentioned pointer analysis such that it 
takes into account deallocations as well. Second, the instrumentation now works 
in two stages. The first stage inserts the checks where extended pointer analysis 
cannot guarantee the dereference safety. Moreover, compared to SYMBIOTIC 4, 
we use simpler checks if possible. For example, if a pointer analysis says that a 
given pointer points into a known fixed-size memory block, we just insert a check 
that the pointer’s offset is within the size of the block (without searching the 
tracked information about the block). The second stage inserts calls to memory 
tracking functions only to allocations of the memory blocks that can be accessed 
by some dereference instrumented in the first stage. Hence, we track only the 
information that may be possibly used in the checks. 

To evaluate the boosted instrumentation, we run the following three configu- 
rations of SYMBIOTIC on 393 benchmarks of the SV-COMP 2017 meta category 
MemSafety and of the category MemSafety-TerminCrafted: 


— basic uses instrumentation without any pointer analysis, 

— ePTA uses extended pointer analysis (i.e. it is a fixed version of the instru- 
mentation in SYMBIOTIC 4), 

— staged uses extended pointer analysis and staged instrumentation. 


Figure 1 clearly shows that the performance improvement brought by the 
extended pointer analysis itself is negligible compared to the performance 
improvement delivered by the extended pointer analysis in combination with 
staged instrumentation. For a precise description of the boosted instrumenta- 
tion, experimental setup and results, we refer to [3]. 

SYMBIOTIC 5 also changed the approach to error witness generation. SYMBI- 
OTIC 4 describes an errorneous run by a sequence of passed program locations. 
The sequence is often very long and it turned out to be too restrictive for witness 
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checkers. SYMBIOTIC 5 provides only the starting and target locations of the run 
and return values of some __-VERIFIER_nondet* calls. More precisely, we provide 
return values of calls in main and such that they are called just once in the run. 
The witnesses are now more often confirmed by witness checkers. 


2 Software Architecture 


All components of SYMBIOTIC are built on top of LLVM 3.9 [9]. We use the 
CLANG compiler to compile the analyzed sources into LLVM bitcode. Symbiotic 
consists of scripts written in Python that distribute work to three basic modules, 
all written in C++: 


Instrumentation module. This module inserts function calls to instructions 
according to a given configuration in JSON. The instrumented functions are 
implemented in C and compiled to LLVM automatically by SYMBIOTIC before 
the instrumentation process. We use this configurable instrumentation for 
instrumenting the memory safety property only. For instrumenting the no- 
overflow property, we use CLANG’s sanitizer as it works sufficiently well in 
this case. 

Slicing module. This module implements an interprocedural version of the 
slicing algorithm based on dependence graphs [5] altogether with analyses 
that are needed to compute dependencies between instructions, i.e. pointer 
analyses (including the extended pointer analysis as described in Sect. 1 that 
is used by the instrumentation) and analyses of reaching definitions. 

Verification backend. For deciding reachability of error locations, we cur- 
rently use our clone of the open-source symbolic executor KLEE [1], that 
was ported to LLVM 3.9 and modified to support error witness generation. 


Before and after slicing, we optimize the code using available LLVM’s opti- 
mizations. The rest of bitcode transformations that we use and whose nature 
is mostly technical (e.g. replacement of calls inserted by CLANG’s sanitizer to 
_VERIFIER_error calls) are implemented as LLVM passes. All the components 
that transform bitcode take a bitcode as an input and give a valid bitcode as 
an output. This makes SYMBIOTIC highly modular: any part (module) can be 
easily replaced or used as a stand-alone tool. 


3 Strengths and Weaknesses 


The main strength of the approach is its universality and modularity. The instru- 
mentation can reduce any safety property to reachability checks and therefore no 
special monitors need to be incorporated into the verification backend. Indeed, 
any tool that can decide reachability of error locations can be plugged-in. 

The main disadvantage of the current configuration is that symbolic execu- 
tion does not satisfactory handle programs with unbounded loops. Moreover, 
KLEE cannot generate invariants for loops. 


4 


5 
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Tool Setup and Configuration 


Download: https://github.com/staticafi/symbiotic/releases/download/5.0.1/ 
symbiotic-5.0.1.zip. 
Installation: Unpack the archive. 
Participation Statement: SYMBIOTIC 5 participates in all categories. 
Execution: Run bin/symbiotic OPTS <source>, where available OPTS 
include: 

e --prp=file, which sets the property specification file to use, 

e --witness=file, which sets the output file for the witness, 

e --32, which sets the 32-bit environment, 

e --help, which shows the full list of possible options. 


Software Project and Contributors 


SYMBIOTIC 5 has been developed by M. Chalupa and M. Vitovská under super- 
vision of J. Strejček. The tool and its components are available under Apache-2.0 
and MIT Licenses. The project is hosted by the Faculty of Informatics, Masaryk 
University. LLVM and KLEE are also available under open-source licenses. The 
project web page is: https://github.com/staticafi/symbiotic. 
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Abstract. ULTIMATE AUTOMIZER is a software verifier that generalizes 
proofs for traces to proofs for larger parts for the program. In recent 
years the portfolio of proof producers that are available to ULTIMATE 
has grown continuously. This is not only because more trace analysis 
algorithms have been implemented in ULTIMATE but also due to the 
continuous progress in the SMT community. In this paper we explain how 
ULTIMATE AUTOMIZER dynamically selects trace analysis algorithms and 
how the tool decides when proofs for traces are “good” enough for using 
them in the abstraction refinement. 


1 Verification Approach 


ULTIMATE AUTOMIZER (in the following called AUTOMIZER) is a software veri- 
fier that is able to check safety and liveness properties. The tool implements an 
automata-based [6] instance of the CEGAR scheme. In each iteration, we pick a 
trace (which is a sequence of statements) that leads from the initial location to 
the error location and check whether the trace is feasible (i.e., corresponds to an 
execution) or infeasible. If the trace is feasible, we report an error to the user; 
otherwise we compute a sequence of predicates along the trace as a proof of the 
trace’s infeasibility. We call such a sequence of predicates a sequence of inter- 
polants since each predicate “interpolates” between the set of reachable states 
and the set of states from which we cannot reach the error. In the refinement step 
of the CEGAR loop, we try to find all traces whose infeasibility can be shown 
with the given predicates and subtract these traces from the set of (potentially 
spurious) error traces that have not yet been analyzed. We use automata to 
represent sets of traces; hence the subtraction is implemented as an automata 
operation. The major difference to a classical CEGAR-based predicate abstrac- 
tion is that we never have to do any logical reasoning (e.g., SMT solver calls) 
that involves predicates of different CEGAR iterations. 

We use this paper to explain how our tool obtains the interpolants that 
are used in the refinement step. The ULTIMATE program analysis framework 
© The Author(s) 2018 
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provides a number of techniques to compute interpolants for an infeasible trace. 
We group them into the following two categories. 


Path program focused techniques (abstract interpretation [5], constraint- 
based invariant synthesis). These techniques do not consider the trace in iso- 
lation but in the context of the analyzed program. The program is projected 
to the statements that occur in the trace; this projection is considered as a 
standalone program called path program. The techniques try to find a Floyd- 
Hoare style proof for the path program, which shows the infeasibility of all 
the path program’s traces. If such a proof is found, the respective predicates 
are used as a sequence of interpolants. These interpolants are “good enough” 
to ensure that in the refinement step all (spurious) error traces of the path 
program are ruled out. 

Trace focused techniques (Craig interpolation, symbolic execution with 
unsatisfiable cores [4]). These techniques consider only the trace. Typically 
they are significantly less expensive and more often successful than tech- 
niques from the first category. However, we do not have any guarantee that 
their interpolants help to prove the infeasibility of more than one trace. 


Recent improvements of AUTOMIZER were devoted to techniques that fall 
into the second category. Our basic paradigms are: (1) use different techniques 
to compute many sequences of interpolants, (2) evaluate the “quality” of each 
sequence, (3) prefer “good” sequences in the abstraction refinement. 

In contrast to related work [3] we have only one measure for the quality of a 
sequence of interpolants: We check if the interpolants constitute a Floyd-Hoare 
annotation of the path program for the trace. If this is the case, we call the 
sequence a perfect sequence of interpolants. If the sequence is perfect, we use it 
for the abstraction refinement. If the sequence is not perfect, we only use it if 
no better sequence is available. Our portfolio of trace focused techniques is quite 
large for three reasons. 


1. We use different algorithms for interpolation. Several SMT solvers have imple- 
mented algorithms for Craig interpolation and we use these as a black box. 
Furthermore, ULTIMATE provides an algorithm [4] to construct an abstraction 
of the trace from an unsatisfiable core provided by an SMT solver. Afterwards, 
two sequences of predicates, one with the sp predicate transformer, the other 
with the wp predicate transformer, are constructed via symbolic execution. 

2. We use different SMT solvers. Typically, different SMT solvers implement 
different algorithms and hence the resulting Craig interpolants or unsatisfiable 
cores are different. 

3. We have several algorithms that produce an abstraction of a trace but preserve 
the infeasibility of the trace. We can apply these as a preprocessing of the 
interpolant computation. 

All our algorithms follow the same scheme: We replace all statements of the 
trace by skip statements. Then we incrementally check feasibility of the trace 
and undo replacements as long as the trace is feasible. Examples for the undo 
order of our algorithms are: (1) Apply the undo first to statements that 
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occur outside of loops, follow the nesting structure of loops for further undo 
operations. (2) Do the very same as the first algorithm but start inside loops. 
(3) Apply the undo to statements with large constants later. (4) Apply the 
undo to statements whose SMT representation is less expensive first (e.g., 
postpone floating point arithmetic). 


At first glance it looks like a good idea to apply different techniques to a 
given trace for as long as no perfect sequence of interpolants was found. This 
has however turned out to be a bad idea for the following reasons. 


1. The path program might be unsafe and we just have to unwind a loop a few 
times until we find a feasible counterexample. 

2. The path program might be so intricate that we are unable to find a loop 
invariant. However, there are cases where the loop can only be taken for a 
small number of times and our tool can prove correctness by proving infeasi- 
bility of each trace individually. 

3. The path program might be so intricate that we are unable to find a loop 
invariant immediately. But if we consider certain unwindings of the loop (e.g., 
the loop is taken an even number of times) our interpolants will form a loop 
invariant. 


We conclude that per iteration of the CEGAR, loop (resp. per trace) we only 
want to apply a fixed number of techniques. According to our experiments there 
are some techniques that are on average more successful than others; however, 
no technique is strictly superior to another. Hence it is neither a good idea to 
always apply the n typically most successful techniques nor to take n random 
techniques in each iteration. 

We follow an approach that we call path program-based modulation. We have 
a preferred sequence in which we apply our techniques. Whenever we see a new 
trace we start at the beginning of this sequence. Whenever we see a trace that is 
similar to a trace we have already seen, we continue in the sequence of techniques 
at the point where we stopped for the similar trace. Our notion of similarity is: 
Two traces are similar if they have the same path program. 

Hence we make sure that for every path program every technique is eventually 
applied to some trace of the path program. 


2 Project, Setup and Configuration 


AUTOMIZER is developed on top of the open-source program analysis frame- 
work ULTIMATE!. ULTIMATE is mainly developed at the University of Freiburg 
and received contributions from more than 50 people. The framework and 
AUTOMIZER are written in Java, licensed under LGPLv3, and their source code 
is available on Github?. 


1 https: //ultimate.informatik.uni-freiburg.de. 
? https: //github.com/ultimate-pa/ultimate. 
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AUTOMIZER’s competition submission is available as a zip archive®. It 
requires a current Java installation (>JRE 1.8) and a working Python 2.7 
installation. The archive contains Linux binaries for AUTOMIZER and the 
required SMT solvers Z3*+, CVC4°, and Matusar®, as well as a Python 
script, Ultimate.py. The Python script translates command line parameters 
and results between ULTIMATE and SV-COMP conventions, and ensures that 
ULTIMATE is correctly configured to run AUTOMIZER. AUTOMIZER is invoked 
through Ultimate. py by calling 


./Ultimate.py --spec prop.prp --file input.c --architecture 
32bit|64bit --full-output [--validate witness.graphm1] 


where prop.prp is the SV-COMP property file, input.c is the C file that 
should be analyzed, 32bit or 64bit is the architecture of the input file, and 
--full-output enables writing all output instead of just the status of the prop- 
erty to stdout. The option --validate witness.graphml is only used during 
witness validation and allows the specification of a file containing a violation [2] 
or correctness witness [1]. 

Depending on the status of the property, a violation or correctness witness 
may be written to the file witness.graphml. AUTOMIZER is not only able to 
generate witnesses, but also to validate them”. In any case, the complete output 
of AUTOMIZER is written to the file Ultimate.log. 

The benchmarking tool BENCHEXEC® contains a tool-info module that 
provides support for AUTOMIZER (ultimateautomizer.py). AUTOMIZER 
participates in all categories, which is also specified in its SV-COMP 
benchmark definition? file uautomizer.xml. In its role as witness valida- 
tor, AUTOMIZER supports all categories except ConcurrencySafety, which 
is specified in the corresponding SV-COMP benchmark definition files 
uautomizer-validate-*-witnesses.xml. 
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Abstract. ULTIMATE TAIPAN is a software model checker that uses trace 
abstraction and abstract interpretation to prove correctness of programs. 
In contrast to previous versions, ULTIMATE TAIPAN now uses dynamic 
block encoding to obtain the best precision possible when evaluating 
transition formulas of large block encoded programs. 


1 Verification Approach 


ULTIMATE TAIPAN (or TAIPAN for brevity) is a software model checker which 
combines trace abstraction [9,10] and abstract interpretation [5]. The algorithm 
of TAIPAN [8] iteratively refines an abstraction of a input program by analyzing 
counterexamples (cf. CEGAR [4]). 

The initial abstraction of the program is an automaton with the same graph 
structure as the program’s control flow graph, where program locations are states, 
transitions are labeled with program statements, and error locations are accept- 
ing. Thus, the language of the automaton consists of all traces, i.e., sequences of 
statements, that, if executable, lead to an error. In each iteration, the algorithm 
chooses a trace from the language of the current automaton and constructs a path 
program from it. A path program is a projection of the (abstraction of the) program 
to the trace. The algorithm then uses abstract interpretation to compute fixpoints 
for the path program. If the fixpoints of the path program are sufficient to prove 
correctness, i.e., the error location is unreachable, at least the chosen trace and all 
other traces that are covered by the path program are infeasible. The computed 
fixpoints constitute a proof of correctness for the path program and can be repre- 
sented as a set of state assertions. From this set of state assertions, the abstraction 
is refined by constructing a new automaton whose language only consists of infeasi- 
ble traces and then subtracting it from the current abstraction using an automata- 
theoretic difference operation. If abstract interpretation was unable to prove cor- 
rectness of the path program, the algorithm obtains a proof of infeasibility of the 
trace using either interpolating SMT solvers or a combination of unsatisfiable cores 
and strongest post or weakest pre [6]. If the currently analyzed trace is feasible, 
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1 procedure foo() { 
2 var a, b : int; 
3 assume a <= 5; a <5 
4 assume b == a; Ab =a 
5 assert b <= 5; Ab >5 
6 
(a) Program in Boogie. (b) No block encoding. (c) Large block encoding. 


Fig. 1. Example program. 


the trace represents a program execution that can reach the error. If the current 
automaton becomes empty after a difference operation, all potential error traces 
have been proven to be infeasible. 


Dynamic Block Encoding. Large block encoding [1] is a technique to reduce 
the number of locations in a control flow graph. As TAIPAN relies on trace 
abstraction, the number of locations determines the performance of the automata 
operations, which impact the overall performance significantly. It is therefore 
beneficial to use a strong block encoding that removes as many locations as pos- 
sible. Unfortunately, the resulting transitions can lead to a loss of precision dur- 
ing the application of an abstract post operator. Consider the example program 
and its control flow graph with different block encodings shown in Fig. 1. Each 
control flow graph consists of a set of program locations LOC, an initial location 
(£; in Fig. 1), a set of error locations ({¢¢} in Fig. 1), and a transition relation 
—C LOC x TF x LOC which defines the transitions between the locations and 
labels each transition with a transition formula from the set of transition formu- 
las TF. Transition formulas encode the semantics of the program as first-order 
logic formulas over various SMT theories. In ULTIMATE, a transition formula 
y is a tuple (py, IN, OUT, AUX, pv) where y is a closed formula over the three 
disjoined sets of input (IN), output (OUT), and auxiliary (AUX) variables, and 
pu: IN U OUT — V is an injective function that maps variables occurring in y 
to program variables. We write output variables as primed variables and input 
variables as unprimed variables. 

TAIPAN computes a fixpoint for each location of a control flow graph by 
(repeatedly) applying an abstract post operator post” to these transition formu- 
las. To this end, an abstract domain D = (A, a, y, U, N, V, post) is used, where A 
is acomplete lattice representing all possible abstract states containing the desig- 
nated abstract states T and L, a is an abstraction function, y is a concretization 
function, U is a join operator, M is a meet operator, V is a widening operator, and 
post* : Ax TF — Ais an abstract transformer which computes an abstract post 
state o’ from a given abstract pre-state o and a transition formula w. TAIPAN uses 
a combination of octagons [11] and sets of divisibility congruences [7] as abstract 
domain, but for brevity we explain the example using intervals. 

In rows 1 to 3 of Table 1, we apply post* of the interval domain in sequence 
to each of the transition formulas from Fig. 1b. In rows 4a and 4b we apply 
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Table 1. Application of post* for transition formulas from Fig. 1. 


Pre-state Transition formula Post state 
1 |{a:T,b: T} a’ <5 {a : [—00,5],b: T} 
{a : [—00, 5] ,b: T} bo =a’ {a : [—00, 5] , b : [—o0, 5]} 
{a : [-o0, 5] ,b: [-00, 5]} |v > 5 {a : [—oo, 5],b: L} 
4a|{a:T,b:T} a’ L5Ab' >5Ab' =d | {a : [-o0, 5],b: L} 
4b|{a:T,b:T} b =a’ Aa’ <5Ab >5| {a: [-,5],b : [-00, 5]} 
the same operator to the only transition formula of Fig.1c, but process the 


conjunction in different orders. Although the logical \-operator is commutative, 
the result differs. This is due to different ways of computing the abstract post 
state. We can express post*(o, A^ B) = o’ either as post (o, A) post* (o, B), as 
post* (post* (a, A), B), or as post* (post*(o, B), A). The interval domain cannot 
express the equality relation between two variables (i.e., the conjunct b = a’), 
therefore, the first way will compute post*({a: T,b : T},b =a’) ={a:T,b: 
T}, effectively rendering the constraint useless. The second and third way may 
succeed, depending on the ordering of conjuncts. In general, the ordering is 
important, but in our example, it does not matter as long as b’ = a’ is not first. 

In TAIPAN, we solve this problem by introducing the notion of expressibility 
to an abstract domain. We augment each abstract domain with an expressibility 
predicate ex which decides for each non-logical symbol of a transition formula 
(i.e., each relation, function application, variable, and constant) whether it can 
be represented in the domain. For example, the interval domain can represent 
all relations that contain at most one variable, while octagons can represent 
all relations of the form +z +y < c. We then apply post# on conjuncts of a 
transition formula in an order induced by ez, thus effectively choosing a new 
dynamic block encoding. For post*(c,y), our algorithm computes o’ by first 
converting the formula y to DNF s.t. p = yo V g1 V...V Yn. For each disjunct 
Pi =p? Ayl... Ap, we compute post” (o, pi) = 04 as follows: 


1. Partition the conjuncts in two classes. The first class contains conjuncts for 
which ex is true, the second for which ez is false. 

2. Compute the abstract post for the conjunction of all expressible conjuncts 
first: | les(ot) post* (o, py) = o". 

3. Compute the abstract post for all non-expressible conjuncts successively using 
the post state of the k-th application as pre-state of the k + 1-th application, 
and the post state of the last application as final result ø} for the disjunct y;: 


post” oy (7k yf) = Ok+1- 


The result for post” (ø, p) is then ||?) o; = o’. 
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2 Project, Setup and Configuration 


TAIPAN is a part of the open-soure program analysis framework ULTIMATE}, 
written in Java, licensed under LGPLv3?, and open source*. The TAIPAN com- 
petition submission is available as a zip archive*. It requires a current Java 
installation (>JRE 1.8) and a working Python 2.7 installation. The submission 
contains an executable version of TAIPAN for Linux platforms, the binaries of 
the required SMT solvers Z3°, CVC4°, and MATHSAT", as well as a Python 
script, Ultimate.py, which maps the SV-COMP interface to ULTIMATE’s com- 
mand line interface and selects the correct settings and the correct toolchain. In 
SV-COMP, TAIPAN is invoked through Ultimate.py with 


./Ultimate.py --spec prop.prp --file input.c --architecture 
32bit|64bit --full-output 


where prop.prp is the SV-COMP property file, input.c is the C file that 
should be analyzed, 32bit or 64bit is the architecture of the input file, and 
--full-output enables writing all output instead of just the status of the 
property to stdout. The complete output of TAIPAN is also written to the file 
Ultimate.log. Depending on the status of the property, a violation [3] or cor- 
rectness [2] witness may be written to the file witness. graphml. 

The benchmarking tool BENCHEXEC® supports TAIPAN through the tool-info 
module ultimatetaipan.py. TAIPAN participates in all categories, as specified 
by its SV-COMP benchmark definition? file utaipan.xml. 
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Abstract. VeriAbs is a portfolio software verifier for ANSI-C programs. 
To prove properties with better efficiency and scalability, this version 
implements output abstraction with k-induction in the presence of resets. 
VeriAbs now generates post conditions over the abstraction to find invari- 
ants by applying Z3’s tactics of quantifier elimination. These invariants 
are then used to generate validation witnesses. To find errors in the 
absence of known program bounds, VeriAbs searches for property vio- 
lating inputs by applying random test generation with fuzz testing for a 
better scalability as compared to bounded model checking. 


1 Verification Approach 


Background. VeriAbs has implemented abstract acceleration [5] and k- 
induction techniques to scale Bounded Model Checking (BMC) for programs 
with loops of large or unknown bounds. VeriAbs abstracts such loops to loops of 
known small bounds, which can be proved by BMC. This abstraction is achieved 
by accelerating selected variables processed inside loops. Further, VeriAbs applies 
incremental k-induction to improve precision. Loops processing arrays of large 
and unknown sizes are substituted by abstract loops that execute a small non- 
deterministically chosen sequence of original loop iterations. The idea is based 
on the concept of loop shrinkability [10]. 


1.1 Tool Enhancements 


For SV-COMP 2018, VeriAbs has been supplemented with an efficient imple- 
mentation of output abstraction to prove properties, random test generation 
with fuzzing to find errors, and witness generation. 


Output Abstraction. The SV-COMP 2017 version of VeriAbs cannot precisely 
validate programs with loops in which all variables are modified with non-linear 


P. Darke—Jury member. 


© The Author(s) 2018 
D. Beyer and M. Huisman (Eds.): TACAS 2018, LNCS 10806, pp. 457-462, 2018. 
https: //doi.org/10.1007/978-3-319-89963-3_32 


458 P. Darke et al. 


arithmetic expressions or resets. For such programs, the current version applies 
an improved output abstraction [13] that simply replaces the corresponding loop 
with non-deterministic assignments to all the modified variables. 


Search for Property Violating Inputs. In order to alleviate the lack of 
abstraction refinement, VeriAbs adopts an approach to search for a property 
violating input. To this end, it uses fuzz testing to search for the input that 
reaches the error location. Fuzz testing is a testing technique that aims to uncover 
run-time errors by executing the target program with a large number of inputs 
generated automatically and systematically. Grey-box fuzzing [3] is a fuzz testing 
technique that uses a light weight instrumentation to observe the target program 
behavior on a test run. It uses this information to generate new test inputs that 
might exhibit new program behaviors. VeriAbs uses American Fuzzy Lop (AFL- 
fuzz) [12] as the fuzz testing tool. 


Witness Generation. The previous version of VeriAbs used CPAchecker [2] 
to generate validation witnesses from abstract programs. The SV-COMP 2018 
version has implemented techniques for generation of both correctness and error 
witnesses. If VeriAbs concludes safety of the input program, it generates the cor- 
rectness witness with loop invariants. These invariants are generated by comput- 
ing the strongest postcondition equation using methods presented in [8], except 
for loops where the loop acceleration information is used instead. These invari- 
ants can have quantifiers and non-program variables. However, SV-COMP 2017 
witness validators recognize only those invariants that are expressed as C expres- 
sions in program variables. VeriAbs uses Z3 [6] to eliminate quantifiers and non- 
program variables from the invariants. These invariants are added to the control 
flow automaton generated by CPAchecker to generate the validation witness. 


The error witness generation technique is decided based on the strategy that 
was used to falsify the input program. When VeriAbs decides that the input 
program is unsafe by fuzz testing (i.e., using AFL-fuzz [12]), it generates a vio- 
lation witness with a valuation of variables at the program points that assign 
non-deterministic values to program variables. This is achieved by replaying 
the execution that caused the property violation on an instrumented input pro- 
gram. This instrumented program prints the aforementioned valuation. In order 
to avoid file latency this instrumented program is only used to replay error exe- 
cution. The values of variables thus obtained are used to generate error witness. 
On the other hand, if input program was decided to be unsafe by using BMC, 
then corresponding error witness is used. 


Array Loop Abstraction. We abstract loops that process arrays of large or 
unknown sizes having quantified property, using the method based on the idea 
of loop shrinkability [10]. We call an array processing loop as k-shrinkable when 
the original program is guaranteed to be correct if execution of every sequence 
of k iterations of the original loop results in property, which is projected to 
the chosen sequence, being satisfied. A k-shrinkable loop, is replaced with an 
abstract loop that executes the non-deterministically chosen sequence of k iter- 
ations of the original loop and the property is also translated to be checked 
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over array elements corresponding to the chosen sequence of iterations only. The 
k-shrinkability criterion ensures that if the program is incorrect then the trans- 
lated property will get violated for some sequence of k iterations, in the abstract 
program. 


2 Verification Process and Software Architecture 


The verification process of VeriAbs is shown in Fig. 1. VeriAbs passes the input 
C file to a Tata Consultancy Services (TCS) [1] in-house C front end to generate 
the intermediate representation (IR) of the program. It then analyzes this IR 
using PRISM, a TCS in-house program analysis framework [9] to perform the 
abstractions and instrumentation. It uses C Bounded Model Checker (CBMC) [4] 
version 5.8 with MINISAT [7] to validate the abstraction or the original program 
of known bounds. VeriAbs generates correctness witnesses by computing loop 
invariants using strongest-postcondition. It uses Z3 version 4.5.1 to eliminate 
quantifiers as SV-COMP requires invariants to be expressed as C expressions. 
These simplified invariants are added to the control flow automaton generated 
by CPAchecker version 1.6.1 [2]. VeriAbs uses CBMC version 5.8 for generating 
error witnesses. For fuzz testing, VeriAbs uses AFL-fuzz [12] version 2.35b. It 
invokes CBMC and AFL-fuzz sequentially, for program falsification. 


simplified loop invariants 


Validation 


witness 
| Invariant Bounded Model Checker generation 
generation & Verification 
c program simplification © ex over result with 
nae refinement abstract 6 violated in the ae wines 
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Fig. 1. The verification process of VeriAbs - enhancements are highlighted 


The SV-COMP 2018 version of VeriAbs first analyzes every loop to check if 
it contains some linear modifications to numerical variables so that they can be 
precisely validated by Loop Abstraction for BMC (LABMC) [5]. If this check 
passes, it applies a range analysis [11] to identify ranges of those variables. On 
the other hand, when all variables are non-linearly modified a simpler output 
abstraction is applied. If the loop reads or modifies arrays, then it applies array 
loop abstraction as explained in Sect.1, and then applies BMC to validate the 
abstraction. To find errors, VeriAbs uses the new program instrumentation for 
violation witness generation and grey-box fuzzing with AFL to generate wit- 
nesses for such programs. 
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3 Strengths and Weaknesses 


The main strength of VeriAbs is that it is sound. All transformations imple- 
mented by the tool are over-approximations. In case of CBMC, the tool pro- 
vides an option (unwinding-assertions) which ensures sufficient unwinding for 
proving the property. Hence if the tool reports that a property holds then it 
indeed holds. Another key strength is that it transforms all loops in a program 
to abstract loops with a known finite number of iterations, enabling the use of 
bounded model checkers for property proving. The main weakness of the tool is 
that it does not implement a refinement process that is well suited to find errors. 
But it can find errors using fuzz testing and bounded model checking. VeriAbs 
is dependent on Z3 for quantifier and non-program variable elimination from 
correctness witness invariants, and it is dependent on CPAchecker for generat- 
ing program automata. As compared to the results of SV-COMP 2017 version, 
VeriAbs performed significantly better in Arrays, Loops, ECA, Sequentialized 
and Recursive sub categories this year. 


4 Tool Setup and Configuration 


The VeriAbs SV-COMP 2018 executable is available for download at the 
URL http://www.cmi.ac.in/~madhukar/veriabs/VeriAbs.zip. To install the tool, 
download the archive, extract its contents, and then follow the installation 
instructions in VeriAbs/INSTALL.txt. To execute VeriAbs, the user needs to 
specify the property file of the respective verification category using the 
--property-file option. The witness is generated in the current working direc- 
tory as witness.graphml. A sample command is as follows: 
VeriAbs/scripts/veriabs --property-file ALL.prp example.c 


VeriAbs is participating in the ReachSafety category. The BenchExec wrap- 
per script for the tool is veriabs.py and veriabs.xml is the benchmark descrip- 
tion file. 


5 Software Project and Contributors 


VeriAbs is a verification tool maintained by TCS Research [1], and parts of it 
have been developed by the authors, Mohammad Afzal and other members of 
this organization. We would like to thank Charles Babu M and other interns 
who have contributed to the development of VeriAbs. 
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